CATALOGO DEI PRODOTTI DELLA RICERCA

This thesis focuses on the syntagmatic relations between terms, by studying a specific type of phraseological unit, the collocation. Their representation in a terminology database is not at all immediate because of several theoretical and practical issues concerning their definition, automatic extraction and classification. To try to answer these questions, this study is organised in five chapters. The first chapter defines the notion of idiomaticity, while the second chapter focus its attention on collocations, in general language and in terminology. This path leads to the presentation of the definition of collocation adopted in this terminological study, a definition that, inspired by Explanatory and Combinatorial Lexicology (Mel'čuk et al. 1995), is based on quantitative and qualitative criteria: the collocation is a recurrent semi-contraint phraseological unit composed by two elements - the base, a term of the domain of reference, and the collocate - which are connected by a semantic-syntactic link, i.e. by a lexical function (Mel'čuk, Polguère 2021). After defining collocation, the third chapter describes its automatic extraction, presenting several techniques and tools. In this overview, in chapter 4, after presenting the corpus on which this study is based, the DIACOM-fr 1985-2020 international trade corpus carried out within the DIACOM project of the Department of Foreign Languages and Literatures of the University of Verona, and the text preprocessing step, the methodology for automatic collocation extraction adopted in this study is described. It is a hybrid methodology developed ad hoc using two tools Stanza and TermoStat Web 3. 0 ; this methodology is based on qualitative and quantitative criteria: in particular, word pairs of the type Noun + Adjective, Noun + Preposition + Noun, Noun (subject) + Verb, Verb + Noun (complement), having a high frequency and a high power of association are extracted; moreover, interested in the collocations of the international trade, only the pairs having a score of specificity ≥ 1,96 are considered. The analysis of the results reveals a high percentage of noise in the data, which forces to proceed to manual filtering. In the last chapter the 951 collocations from the international trade domain are described and analysed. This method of analysis will allow, in the future, an automatic implementation of the collocations extracted and analyzed in the terminological database DIACOM-fr in the process of being built at the University of Verona.

Collocations et terminologie : extraction semi-automatique et classement des collocations dans le domaine du commerce international

Silvia Calvi

2022-01-01

Abstract

This thesis focuses on the syntagmatic relations between terms, by studying a specific type of phraseological unit, the collocation. Their representation in a terminology database is not at all immediate because of several theoretical and practical issues concerning their definition, automatic extraction and classification. To try to answer these questions, this study is organised in five chapters. The first chapter defines the notion of idiomaticity, while the second chapter focus its attention on collocations, in general language and in terminology. This path leads to the presentation of the definition of collocation adopted in this terminological study, a definition that, inspired by Explanatory and Combinatorial Lexicology (Mel'čuk et al. 1995), is based on quantitative and qualitative criteria: the collocation is a recurrent semi-contraint phraseological unit composed by two elements - the base, a term of the domain of reference, and the collocate - which are connected by a semantic-syntactic link, i.e. by a lexical function (Mel'čuk, Polguère 2021). After defining collocation, the third chapter describes its automatic extraction, presenting several techniques and tools. In this overview, in chapter 4, after presenting the corpus on which this study is based, the DIACOM-fr 1985-2020 international trade corpus carried out within the DIACOM project of the Department of Foreign Languages and Literatures of the University of Verona, and the text preprocessing step, the methodology for automatic collocation extraction adopted in this study is described. It is a hybrid methodology developed ad hoc using two tools Stanza and TermoStat Web 3. 0 ; this methodology is based on qualitative and quantitative criteria: in particular, word pairs of the type Noun + Adjective, Noun + Preposition + Noun, Noun (subject) + Verb, Verb + Noun (complement), having a high frequency and a high power of association are extracted; moreover, interested in the collocations of the international trade, only the pairs having a score of specificity ≥ 1,96 are considered. The analysis of the results reveals a high percentage of noise in the data, which forces to proceed to manual filtering. In the last chapter the 951 collocations from the international trade domain are described and analysed. This method of analysis will allow, in the future, an automatic implementation of the collocations extracted and analyzed in the terminological database DIACOM-fr in the process of being built at the University of Verona.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di conseguimento del titolo
	
				2022
			
	Parole Chiave
	
				terminologie, collocations, extraction terminologique, commerce international, réseau lexical
			
	Appare nelle tipologie:
	
				07.13 Doctoral Thesis

File in questo prodotto:

File	Dimensione	Formato
Calvi Silvia 2022 Collocations et terminologie extraction semi-automatique et classement des collocations dans le domaine du commerce international.pdf Open Access dal 01/07/2023 Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 6.93 MB Formato Adobe PDF Visualizza/Apri	6.93 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1061691

Citazioni

ND

ND

ND

social impact