Optimizing tourism data extraction and analysisA comprehensive methodology
- José Javier Galán-Hernández
- Ramón Alberto Carrasco-González
- Gabriel Marín-Díaz
- Antonio J. Guevara Plaza (ed. lit.)
- Alfonso Cerezo Medina (ed. lit.)
- Enrique Navarro Jurado (ed. lit.)
Publisher: Springer Suiza
ISBN: 978-3-031-52607-7, 978-3-031-52606-0
Year of publication: 2024
Pages: 37-46
Congress: TuriTec: Congreso Nacional Turismo y Tecnologías de la Información y las Comunicaciones (14. 2023. Málaga)
Type: Conference paper
Abstract
There are various sources that provide data related to tourism. However, at times, this data lacks structure or is found in sources that do not facilitate its easy, automatic, or unsupervised collection. In such situations, a methodology employing data science techniques offers a significant advantage to researchers. They can leverage the tools available through the proposed methodology to extract, process, and analyze information efficiently. While this methodology is applicable to various disciplines, this work presents a specific case focused on tourism in Spain. Methodology: Employing data science techniques like graph analysis and unsupervised machine learning, we collect and process data on tourists’ origins and numbers in Spain, using Python, R, and VOSViewer. The analysis uncovers primary tourism sources and origin-country patterns. It delves deep into Andalusia due to its high tourist influx. Results: Our study reveals key Spanish tourism sources and visitor behavior patterns. Visual data illustrates tourist origins, visit numbers, and interactions. Additionally, Andalusia is thoroughly examined for visit counts and origin countries. Conclusions: Employing data science, our study yields insights into Spanish tourism, identifying core sources and understanding origin-country interactions. These findings inform strategic decisions and enhance Spain’s tourism promotion and management.