This thesis focuses on the syntagmatic relations between terms, by studying a specific type of phraseological unit, the collocation. Their representation in a terminology database is not at all immediate because of several theoretical and practical issues concerning their definition, automatic extraction and classification. To try to answer these questions, this study is organised in five chapters. The first chapter defines the notion of idiomaticity, while the second chapter focus its attention on collocations, in general language and in terminology. This path leads to the presentation of the definition of collocation adopted in this terminological study, a definition that, inspired by Explanatory and Combinatorial Lexicology (Mel'čuk et al. 1995), is based on quantitative and qualitative criteria: the collocation is a recurrent semi-contraint phraseological unit composed by two elements - the base, a term of the domain of reference, and the collocate - which are connected by a semantic-syntactic link, i.e. by a lexical function (Mel'čuk, Polguère 2021). After defining collocation, the third chapter describes its automatic extraction, presenting several techniques and tools. In this overview, in chapter 4, after presenting the corpus on which this study is based, the DIACOM-fr 1985-2020 international trade corpus carried out within the DIACOM project of the Department of Foreign Languages and Literatures of the University of Verona, and the text preprocessing step, the methodology for automatic collocation extraction adopted in this study is described. It is a hybrid methodology developed ad hoc using two tools Stanza and TermoStat Web 3. 0 ; this methodology is based on qualitative and quantitative criteria: in particular, word pairs of the type Noun + Adjective, Noun + Preposition + Noun, Noun (subject) + Verb, Verb + Noun (complement), having a high frequency and a high power of association are extracted; moreover, interested in the collocations of the international trade, only the pairs having a score of specificity ≥ 1,96 are considered. The analysis of the results reveals a high percentage of noise in the data, which forces to proceed to manual filtering. In the last chapter the 951 collocations from the international trade domain are described and analysed. This method of analysis will allow, in the future, an automatic implementation of the collocations extracted and analyzed in the terminological database DIACOM-fr in the process of being built at the University of Verona.

Collocations et terminologie : extraction semi-automatique et classement des collocations dans le domaine du commerce international

Silvia Calvi
2022-01-01

Abstract

This thesis focuses on the syntagmatic relations between terms, by studying a specific type of phraseological unit, the collocation. Their representation in a terminology database is not at all immediate because of several theoretical and practical issues concerning their definition, automatic extraction and classification. To try to answer these questions, this study is organised in five chapters. The first chapter defines the notion of idiomaticity, while the second chapter focus its attention on collocations, in general language and in terminology. This path leads to the presentation of the definition of collocation adopted in this terminological study, a definition that, inspired by Explanatory and Combinatorial Lexicology (Mel'čuk et al. 1995), is based on quantitative and qualitative criteria: the collocation is a recurrent semi-contraint phraseological unit composed by two elements - the base, a term of the domain of reference, and the collocate - which are connected by a semantic-syntactic link, i.e. by a lexical function (Mel'čuk, Polguère 2021). After defining collocation, the third chapter describes its automatic extraction, presenting several techniques and tools. In this overview, in chapter 4, after presenting the corpus on which this study is based, the DIACOM-fr 1985-2020 international trade corpus carried out within the DIACOM project of the Department of Foreign Languages and Literatures of the University of Verona, and the text preprocessing step, the methodology for automatic collocation extraction adopted in this study is described. It is a hybrid methodology developed ad hoc using two tools Stanza and TermoStat Web 3. 0 ; this methodology is based on qualitative and quantitative criteria: in particular, word pairs of the type Noun + Adjective, Noun + Preposition + Noun, Noun (subject) + Verb, Verb + Noun (complement), having a high frequency and a high power of association are extracted; moreover, interested in the collocations of the international trade, only the pairs having a score of specificity ≥ 1,96 are considered. The analysis of the results reveals a high percentage of noise in the data, which forces to proceed to manual filtering. In the last chapter the 951 collocations from the international trade domain are described and analysed. This method of analysis will allow, in the future, an automatic implementation of the collocations extracted and analyzed in the terminological database DIACOM-fr in the process of being built at the University of Verona.
2022
terminologie, collocations, extraction terminologique, commerce international, réseau lexical
File in questo prodotto:
File Dimensione Formato  
Calvi Silvia 2022 Collocations et terminologie extraction semi-automatique et classement des collocations dans le domaine du commerce international.pdf

Open Access dal 01/07/2023

Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 6.93 MB
Formato Adobe PDF
6.93 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1061691
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact