TACTICIAN: AI-based applications knowledge extraction from ESA’s mission scientific publications

  1. Giannakis, Omiros
  2. Demiros, Iason
  3. Koutroumbas, Konstantinos
  4. Rontogiannis, Athanasios
  5. Antonopoulos, Vassilis
  6. De Marchi, Guido
  7. Arviset, Christophe
  8. Balasis, George
  9. Daglis, Athanasios
  10. Vasalos, George
  11. Boutsi, Zoe
  12. Tauber, Jan
  13. Lopez-Caniego, Marcos
  14. Kidger, Mark
  15. Masson, Arnaud
  16. Escoubet, Philippe
Actas:
EGU General Assembly 2023

Año de publicación: 2023

Congreso: EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023

Tipo: Aportación congreso

DOI: 10.5194/EGUSPHERE-EGU23-1902 GOOGLE SCHOLAR lock_openAcceso abierto editor

Resumen

Scientific publications in space science contain valuable and extensive information regarding thelinks and relationships between the data interpreted by the authors and the associatedobservational elements (e.g., instruments or experiments names, observing times, etc.). In thisreality of scientific information overload, researchers are often overwhelmed by an enormous andcontinuously growing number of articles to access in their daily activities. The exploration of recentadvances concerning specific topics, methods and techniques, the review and evaluation ofresearch proposals and in general any action that requires a cautious and comprehensiveassessment of scientific literature has turned into an extremely complex and time-consuming task.The availability of Natural Language Processing (NLP) tools able to extract information fromscientific unstructured textual contents and to turn it into extremely organized and interconnectedknowledge, is fundamental in the framework of the use of scientific information. Exploitation ofthe knowledge that exists in the scientific publications, necessitates state-of-the-art NLP. Thesemantic interpretation of the scientific texts can support the development of a varied set ofapplications such as information retrieval from the texts, linking to existing knowledgerepositories, topic classification, semi-automatic assessment of publications and researchproposals, tracking of scientific and technological advances, scientific intelligence-assistedreporting, review writing, and question answering.The main objectives of TACTICIAN are to introduce Artificial Intelligence (AI) techniques to thetextual analysis of the publications of all ESA Space Science missions, to monitor and evaluate thescientific productivity of the science missions, and to integrate the scientific publications’ metadatainto the ESA Space Science Archive. Through TACTICIAN, we extract lexical, syntactic, and semanticinformation from the scientific publications by applying NLP and Machine Learning (ML)algorithms and techniques. Utilizing the wealth of publications, we have created valuable scientificlanguage resources, such as labeled datasets and word embeddings, which were used to trainDeep Learning models that assist us in most of the language understanding tasks. In the contextof TACTICIAN, we have devised methodologies and developed algorithms that can assign scientificpublications to the Mars Express, Herschel, and Cluster ESA science missions and identify selectednamed entities and observations in these scientific publications. We also introduced a newunsupervised ML technique, based on Nonnegative Matrix Factorization (NMF), for classifying thePlanck mission scientific publications to categories according to the use of the Planck dataproducts.These methodologies can be applied to any other mission. The combination of NLP and MLconstitutes a general basis, which has proved that it can assist in establishing links between themissions’ observations and the scientific publications and to classify them in categories, with highaccuracy.

Información de financiación

This work has received funding from the European Space Agency under the "ArTificiAl intelligenCe To lInk publiCations wIth observAtioNs (TACTICIAN)" activity under ESA Contract No 4000128429/19/ES/JD.

Financiadores