Linguistic data science lies at the intersection of several fields: computer science, mathematics and statistics, and linguistics. While (traditional) data science relies primarily on the statistical dimension of data, linguistic data science also requires mechanisms for explicit representation of language data, in order to address not only "plain" text but also the linguistic features of the data (grammatical categories, word senses, etc.). This helps address complex questions for which a purely statistical approach is insufficient.
This master's degree responds to the need for training at this intersection of fields of knowledge, combining teaching in a new field of study at excellent higher education institutions and providing mobility as an essential feature of the program.
Students will explore the specificities of linguistic data in both small and big data contexts. Implicit representation of linguistic aspects (distributional semantics, embeddings, latent semantic analysis, etc.) will be studied, as well as explicit representation of linguistic data (terminologies, dictionaries, ontologies, annotated corpora, etc.). Students will be trained in machine learning and deep learning techniques for use in the analysis and processing of lexical and textual data, as well as in the Semantic Web and linguistic linked data.
You can find more details at the Master's website: https://emlds.fcsh.unl.pt/
Given the current emergence in the adoption of Artificial Intelligence (AI) techniques in general, and Natural Language Processing (NLP) in particular, studying linguistic data science means positioning yourself at the heart of this revolution. It means being part of those designing the future of human-machine communication, with linguistic sensitivity and technical rigor. Current AI models not only process words, but also need to understand syntactic, semantic, and pragmatic structures. This requires experts capable of representing language computationally and tackling complex problems such as semantic ambiguity, automatic text generation in specific contexts, machine translation, or overcoming language barriers on the Web.
Two student profiles are expected:
1) graduates in scientific and technical fields, with skills in programming, mathematics, and statistics; and
2) graduates in humanities and language sciences, with skills in linguistics and social sciences.
The program will equip students with skills that complement and go beyond their original training, emphasizing interdisciplinarity and encouraging collaborative activities and practices with students from diverse backgrounds.
The graduate profile will have the following characteristics and skills:
This profile will allow them to transfer their knowledge of linguistic data science to practical applications in industry or in academic research, enabling their future incorporation into teams in companies, public administrations, or universities.