The distance among closely related languages is usually measured from three dimensions: structural, functional and perceptual. The structural distance is determined by directly quantifying the phonetic, lexical, morphological and syntactic differences among the languages. The functional distance is measured based on the actual usage of the languages, e.g., mutual intelligibility and inter-lingual comprehensibility. The perceptual distance is related to the subjective judgment of the speakers about the similarity or intelligibility between their native language and the neighboring related languages. Studies on language variation measure linguistic distances at least from one of these dimensions. However, as Gooskens (2018) and Tang and Heuven (2009) noticed, languages do not differ just in one dimension; they can be, for example, phonetically similar but syntactically different. The present study, therefore, combined these three perspectives to examine the distance among purposely selected ten South Ethiosemitic languages (Chaha, Endegagn, Ezha, Gumer, Gura, Inor, Kistane, Mesqan, Muher and Silt'e). The study aims to (1) determine the areal classification of the languages; (2) illustrate the similarity or difference between the areal classification of the languages and previous classification by historical linguists; (3) determine the degree of mutual intelligibility among the languages; (4) examine the relationship among the three dimensions of linguistic distances, and (5) explore major determinants (linguistic and non-linguistic) which contribute to the linguistic distance among the languages. The structural distance was determined by computing the lexical and phonetic differences based on randomly selected 240 words. The lexical distance was defined as the average of pairs of non-cognates in the basic vocabularies. Levenshtein algorithm (Heeringa, 2004; Kessler, 1995) was used to compute the phonetic distance. The phonetic distance was defined as an operation that is required to transform a form of sequence of phones. Semantic Word Categorization test was adapted from Tang and Heuven (2009) to measure the functional distance. Self-rating test, based on the recordings of ‘the North Wind and the Sun’, was administered to determine the perceptual distance among the languages. With regard to the linguistic determinants, the degree of diffusion of the phonetic and lexical features was estimated using Neighbor-net network representation and lexicostatistical skewing. The study also examined the influences of four non-linguistic determinants: geographical distance, population size, the degree of contact among the speakers and language attitude. Gabmap was used for clustering and cluster validation. Multidimensional scaling and fuzzy clustering were employed for the cluster validation. The classifications obtained from each of the distance matrices were compared to the previous classifications (by historical linguists) based on the cophenetic distance among various sub-groupings. The results of the cluster analysis show that the ten selected South Ethiosemitic language varieties can be fairly grouped into five: {Chaha, Ezha, Gumer, Gura}, {Mesqan, Muher}, {Endegagn, Inor}, {Kistane} and {Silt'e}. This classification is very similar to the classifications previously proposed by historical linguists (e.g. Hetzron (1972, 1977). There is also very strong correlation among the measures of the three dimensions of distance. However, these measures have different degree of reliability; the structural distance is the most reliable measure while the perceptual distance is the least reliable distance measure. Furthermore, the Word Categorization test results show that many of these languages are mutually intelligible. Silt’e is not mutually intelligible with any of the languages investigated in the present study. The results obtained from the analysis of the linguistic determinants show that the similarity among the language varieties is mainly the result of the contact among the languages. Moreover, the results of the analysis of the non-linguistic variables indicate a strong positive correlation between the geographical distance and linguistics distance, and positive contribution of the contact among the speakers. Nevertheless, there is no significant correlation between the linguistic distance and population size. Besides, among the three dimensions of measuring linguistic distance, it is the perceptual distance that is most affected by the attitude of the speakers.

A Combined Approach towards Measuring Linguistic Distance: A Study on South Ethiosemitic Languages

Tekabe Legesse Feleke
2020-01-01

Abstract

The distance among closely related languages is usually measured from three dimensions: structural, functional and perceptual. The structural distance is determined by directly quantifying the phonetic, lexical, morphological and syntactic differences among the languages. The functional distance is measured based on the actual usage of the languages, e.g., mutual intelligibility and inter-lingual comprehensibility. The perceptual distance is related to the subjective judgment of the speakers about the similarity or intelligibility between their native language and the neighboring related languages. Studies on language variation measure linguistic distances at least from one of these dimensions. However, as Gooskens (2018) and Tang and Heuven (2009) noticed, languages do not differ just in one dimension; they can be, for example, phonetically similar but syntactically different. The present study, therefore, combined these three perspectives to examine the distance among purposely selected ten South Ethiosemitic languages (Chaha, Endegagn, Ezha, Gumer, Gura, Inor, Kistane, Mesqan, Muher and Silt'e). The study aims to (1) determine the areal classification of the languages; (2) illustrate the similarity or difference between the areal classification of the languages and previous classification by historical linguists; (3) determine the degree of mutual intelligibility among the languages; (4) examine the relationship among the three dimensions of linguistic distances, and (5) explore major determinants (linguistic and non-linguistic) which contribute to the linguistic distance among the languages. The structural distance was determined by computing the lexical and phonetic differences based on randomly selected 240 words. The lexical distance was defined as the average of pairs of non-cognates in the basic vocabularies. Levenshtein algorithm (Heeringa, 2004; Kessler, 1995) was used to compute the phonetic distance. The phonetic distance was defined as an operation that is required to transform a form of sequence of phones. Semantic Word Categorization test was adapted from Tang and Heuven (2009) to measure the functional distance. Self-rating test, based on the recordings of ‘the North Wind and the Sun’, was administered to determine the perceptual distance among the languages. With regard to the linguistic determinants, the degree of diffusion of the phonetic and lexical features was estimated using Neighbor-net network representation and lexicostatistical skewing. The study also examined the influences of four non-linguistic determinants: geographical distance, population size, the degree of contact among the speakers and language attitude. Gabmap was used for clustering and cluster validation. Multidimensional scaling and fuzzy clustering were employed for the cluster validation. The classifications obtained from each of the distance matrices were compared to the previous classifications (by historical linguists) based on the cophenetic distance among various sub-groupings. The results of the cluster analysis show that the ten selected South Ethiosemitic language varieties can be fairly grouped into five: {Chaha, Ezha, Gumer, Gura}, {Mesqan, Muher}, {Endegagn, Inor}, {Kistane} and {Silt'e}. This classification is very similar to the classifications previously proposed by historical linguists (e.g. Hetzron (1972, 1977). There is also very strong correlation among the measures of the three dimensions of distance. However, these measures have different degree of reliability; the structural distance is the most reliable measure while the perceptual distance is the least reliable distance measure. Furthermore, the Word Categorization test results show that many of these languages are mutually intelligible. Silt’e is not mutually intelligible with any of the languages investigated in the present study. The results obtained from the analysis of the linguistic determinants show that the similarity among the language varieties is mainly the result of the contact among the languages. Moreover, the results of the analysis of the non-linguistic variables indicate a strong positive correlation between the geographical distance and linguistics distance, and positive contribution of the contact among the speakers. Nevertheless, there is no significant correlation between the linguistic distance and population size. Besides, among the three dimensions of measuring linguistic distance, it is the perceptual distance that is most affected by the attitude of the speakers.
2020
Combined approach , Linguistic Distance, Mutual Intelligibility, South Ethiosemitic Languages
File in questo prodotto:
File Dimensione Formato  
PHD THESIS FINAL SUBMISSION.pdf

accesso aperto

Descrizione: PhD Dissertation on Measuring Linguistic Distance: South Ethiosemitic Languages
Tipologia: Tesi di dottorato
Licenza: Accesso ristretto
Dimensione 8.38 MB
Formato Adobe PDF
8.38 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1017111
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact