Just Accepted Manuscripts
Articles

BHGPT: Multiple AI Models for Built Heritage Knowledge Retrieval

Cassia De Lian Cui
Sapienza University of Rome
Antonio Fioravanti
Sapienza University of Rome
Edoardo Currà
Sapienza University of Rome

Published 2026-06-04

Keywords

  • BHGPT,
  • Fine-tuning,
  • AI Assistant,
  • Built Heritage,
  • Knowledge Management

Abstract

BHGPT (Built Heritage Generative Pretrained Transformers) is an AI-driven framework designed to improve knowledge retrieval and management in the built heritage context. While traditional methods struggle with heterogeneous sources, inconsistencies, and evolving historical data, this study explores customized GPT, Fine-Tuned BHGPT, and an AI Assistant with Retrieval-Augmented Generation (RAG) to enhance the accessibility and interpretability of heritage information. This study adopts a domain knowledge first and operational approach, in which discipline-specific concepts and terminology guide the organization of information and the structuring of model outputs. The approach begins with a customized GPT model and prompt engineering to test the model's response behavior. After that, fine-tuning the model on specific datasets enhances domain-specific accuracy. Finally, the AI Assistant with RAG integrates structured HBIM data and unstructured archival sources, enabling dynamic querying and cross-referencing of historical and architectural knowledge. The framework is tested on the Sanctuary of Hercules and the Former Segrè Papermill in Tivoli, evaluating its performance across historical evolution, architectural aspects, and interdisciplinary knowledge. Results indicate that Fine-Tuned BHGPT significantly improves site-specific knowledge extraction, while AI Assistant with RAG provides the most flexible and adaptive responses by linking multiple data sources. However, its accuracy is dependent on data availability and retrieval mechanisms. Overall, the contribution clarifies the utility of the BHGPT framework for heritage professionals, enhancing accessibility through natural language querying, producing structured and interoperable outputs, and supporting transparent, source-grounded interpretation.

References

  1. [1] Yang X, Grussenmeyer P, Koehl M, et al (2020) Review of built heritage modelling: Integration of HBIM and other information techniques, Journal of Cultural Heritage, 46: 350–360. doi:10.1016/j.culher.2020.05.008.
  2. [2] Aiello A, Pierobon-Benoit R, Proto F (2005) Concettualizzazione e contestualizzazione dei beni culturali archeologici, Archeologia e Calcolatori, 16: 321–339.
  3. [3] Bruno N, Roncella R (2019) HBIM for conservation: A new proposal for information modeling, Remote Sensing, 11(15): 1751. doi:10.3390/rs11151751.
  4. [4] Pauwels P, Zhang S, Lee YC (2017) Semantic web technologies in AEC industry: A literature overview, Automation in construction, 73: 145–165. doi:10.1016/j.autcon.2016.10.003.
  5. [5] Saka A, Taiwo R, Saka N, et al (2024) GPT models in construction industry: opportunities, limitations, and a use case validation, Developments in the Built Environment, 17, 100300. doi:10.1016/j.dibe.2023.100300.
  6. [6] Liu P, Yuan W, Fu J, et al (2023) Pre-train , Prompt , and Predict : A Systematic Survey, ACM computing surveys, 55(9): 1–35. doi:10.1145/3560815.
  7. [7] Lombardi M, Rizzi D (2024) Semantic modelling and HBIM: A new multidisciplinary workflow for archaeological heritage, Digital Applications in Archaeology and Cultural Heritage, 32: 00322. doi:10.1016/j.daach.2024.e00322.
  8. [8] Radford A, Wu J, Child R, et al (2018) Language Models are Unsupervised Multitask Learners, Technical report, OpenAi.
  9. [9] Raffel C, Roberts A, Matena M, et al (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Journal of machine learning research, 21(140): 1–67.
  10. [10] Min S , Lyu X, Holtzman A, et al (2022) Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. [11] Zheng J, Fischer M (2023) Dynamic prompt-based virtual assistant framework for BIM information search, Automation in Construction, 155, 105067. doi:10.1016/j.autcon.2023.105067.
  12. [12] You H, Ye Y, Zhou T, et al (2023) Robot-Enabled Construction Assembly with Automated Sequence Planning Based on ChatGPT: RoboGPT, Buildings, 13(7): 1772. doi: 10.3390/buildings13071772.
  13. [13] Prieto SA, Mengiste ET (2023) Investigating the Use of ChatGPT for the Scheduling of Construction Projects, Buildings, 13(4), 857. doi:10.3390/buildings13040857
  14. [14] Uddin SMJ, Albert A, Ovid A (2023) Leveraging ChatGPT to Aid Construction Hazard Recognition and Support Safety Education and Training, Sustainability, 15(9): 7121. doi:10.3390/su15097121
  15. [15] Firmani D, Merialdo P, Maiorino M, Nieddu E (2018) Towards knowledge discovery from the Vatican secret archives. In codice ratio - episode 1: Machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 263–272. doi:10.1145/3219819.3219879.
  16. [16] Spina S (2023) Artificial Intelligence in archival and historical scholarship workflow: HTS and ChatGPT, Umanistica Digitale, 7(16), 125–140. doi:10.6092/issn.2532-8816/17205.
  17. [17] Loffredo R, De Santo M (2024) Using Ontologies for LLM Applications in Cultural Heritage. In: CEUR workshop proceedings, vol. 3865, pp. 37–43.
  18. [18] Otieno PN (2024) Framework for building linguistic corpora for a large language model project for the Heritage Nubian Language of Kenya, Journal of Languages, Linguistics and Literary Studies, 4(3): 139–144. doi:10.57040/tvpzzk79
  19. [19] Zhao Z, Wang D (2025) Evaluation of large language models for the intangible cultural heritage domain, npj Heritage Science, 13(1): 1–11, doi:10.1038/s40494-025-02013-1.
  20. [20] Hwang H, Park CW, Kim HK, Lee JH (2025) CATS: cultural-heritage classification using LLMs and distribute model, npj Heritage Science, 13(1): 1–13. doi:10.1038/s40494-025-01621-1.
  21. [21] Cossatin AG, Mauro N, Ferrero M, Ardissono L (2025) Tell me more : integrating LLMs in a cultural heritage website for advanced information exploration support. In AG Cossatin et al. Information Technology & Tourism, 27(2), 385-416. doi:10.1007/s40558-025-00312-8.
  22. [22] Cui CDL, Currà E, Fioravanti A, Yan W. (2024) AI-Powered Built Heritage: Enhancing Interpretation and Recovery Processes with Generative AI Models. In ReUSO 2024 Documentazione, restauro e rigenerazione sostenibile del patrimonio costruito, pp. 117-126.
  23. [23] Dabrock K, Johansson T, Donarelli A, et al (2026) Automated Building Heritage Assessment Using Street-Level Imagery, Building and Environment, 299: 114654. doi:10.1016/j.buildenv.2026.114654
  24. [24] Raiaan MAK, Mukta MSH, Fatema K, et al (2024) A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges, IEEE Access, 12: 26839–26874. doi:10.1109/ACCESS.2024.3365742.
  25. [25] He W, Peng L, Jiang Z, Go A (2025) You Only Fine-tune Once: Many-Shot In-Context Fine-Tuning for Large Language Model, arXiv preprint. doi:10.48550/arXiv.2506.11103
  26. [26] Kim MJ, Finn C, Liang P (2025) Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success, arXiv preprint. doi:10.48550/arXiv.2502.19645
  27. [27] Mu H, Xu Y, Feng Y, et al (2024) Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants’ API Invocation Capabilities. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 2342–2353.
  28. [28] Xie Y, Jiang B, Mallick T, et al (2025) A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation, arXiv preprint. doi:10.48550/arXiv.2504.17200
  29. [29] Paschen U, Pitt C, Kietzmann J (2020) Artificial intelligence: Building blocks and an innovation typology, Business Horizons, 63(2): 147-155. doi:10.1016/j.bushor.2019.10.004.
  30. [30] Currà E, D’Amico A, Angelosanti M (2022) HBIM between Antiquity and Industrial Archaeology: Former Segrè Papermill and Sanctuary of Hercules in Tivoli, Sustainability, 14(3): 1329. doi:10.3390/su14031329.