Misplaced Pages

Automated Similarity Judgment Program

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
#510489

45-404: The Automated Similarity Judgment Program ( ASJP ) is a collaborative project applying computational approaches to comparative linguistics using a database of word lists. The database is open access and consists of 40-item basic-vocabulary lists for well over half of the world's languages. It is continuously being expanded. In addition to isolates and languages of demonstrated genealogical groups,

90-441: A $ mark follows three consonants so that they are considered to be in the same position. ndy$ im is considered similar to nim , dam and yim . " marks the preceding consonant as glottalized . Comparative linguistics Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness. Genetic relatedness implies

135-547: A [ʔ] or a [ˀ] , a glottal stop modifier, then one speaks of pre-glottalization or glottal reinforcement. This is common in some varieties of English , RP included; /t/ and /tʃ/ are the most affected but /p/ and /k/ also regularly show pre-glottalization. In the English dialects exhibiting pre-glottalization, the consonants in question are usually glottalized in the coda position: "what" [ˈwɒʔt] , "fiction" [ˈfɪʔkʃən] , "milkman" [ˈmɪɫʔkmən] , "opera" [ˈɒʔpɹə] . To

180-615: A proto-language , to investigate sound symbolism , to evaluate different phylogenetic methods, and several other purposes. ASJP is not widely accepted among historical linguists as an adequate method to establish or evaluate relationships between language families. It is part of the Cross-Linguistic Linked Data project hosted by the Max Planck Institute for the Science of Human History . ASJP

225-550: A common origin or proto-language and comparative linguistics aims to construct language families , to reconstruct proto-languages and specify the changes that have resulted in the documented languages. To maintain a clear distinction between attested and reconstructed forms, comparative linguists prefix an asterisk to any form that is not found in surviving texts. A number of methods for carrying out language classification have been developed, ranging from simple inspection to computerised hypothesis testing. Such methods have gone through

270-499: A long process of development. The fundamental technique of comparative linguistics is to compare phonological systems, morphological systems, syntax and the lexicon of two or more languages using techniques such as the comparative method . In principle, every difference between two related languages should be explicable to a high degree of plausibility; systematic changes, for example in phonological or morphological systems are expected to be highly regular (consistent). In practice,

315-547: A series of light implosives or voiced consonants with glottal reinforcement. The airstream parameter is only known to be relevant to obstruents, but the first two are involved with both obstruents and sonorants, including vowels. When a phoneme is completely substituted by a glottal stop [ʔ] , one speaks of glottaling or glottal replacement. This is, for instance, very common in British English dialects such as Cockney and Estuary English dialects. In these dialects,

360-501: A simultaneous single segment [d̰] to an onset or coda such as [ˀd] or [dˀ] to a sequence such as [ʔd] or [dʔ] . Full or partial closure of the glottis also allows glottalic airstream mechanisms to operate, producing ejective or implosive consonants; implosives may themselves have modal, stiff, or creaky voice. It is not always clear from linguistic descriptions if a language has a series of light ejectives or voiceless consonants with glottal reinforcement, or similarly if it has

405-404: A single language, with comparison of word variants, to perform the same function. Internal reconstruction is more resistant to interference but usually has a limited available base of utilizable words and is able to reconstruct only certain changes (those that have left traces as morphophonological variations). In the twentieth century an alternative method, lexicostatistics , was developed, which

450-467: A type of consonant attested in no Indo-European language known at the time. The hypothesis was vindicated with the discovery of Hittite , which proved to have exactly the consonants Saussure had hypothesized in the environments he had predicted. Where languages are derived from a very distant ancestor, and are thus more distantly related, the comparative method becomes less practicable. In particular, attempting to relate two reconstructed proto-languages by

495-516: A way that is considered pseudoscientific by specialists (e.g. spurious comparisons between Ancient Egyptian and languages like Wolof , as proposed by Diop in the 1960s ). The most common method applied in pseudoscientific language comparisons is to search two or more languages for words that seem similar in their sound and meaning. While similarities of this kind often seem convincing to laypersons, linguistic scientists consider this kind of comparison to be unreliable for two primary reasons. First,

SECTION 10

#1732771931511

540-583: Is common even among RP speakers. Geordie English has a unique form of glottalization involving glottal reinforcement of t, k, and p, for example in "matter", "lucky", and "happy". T, k, p sounds between vowels are pronounced simultaneously with a glottal stop represented in IPA as p͡ʔ, k͡ʔ and t͡ʔ. Glottal replacement occurs in Indonesian , where syllable final /k/ is produced as a glottal stop. In every Gorontalic language except Buol and Kaidipang , *k

585-430: Is mainly associated with Morris Swadesh but is based on earlier work. This uses a short word list of basic vocabulary in the various languages for comparisons. Swadesh used 100 (earlier 200) items that are assumed to be cognate (on the basis of phonetic similarity) in the languages being compared, though other lists have also been used. Distance measures are derived by examination of language pairs but such methods reduce

630-529: Is seldom applied today. Dating estimates can now be generated by computerised methods that have fewer restrictions, calculating rates from the data. However, no mathematical means of producing proto-language split-times on the basis of lexical retention has been proven reliable. Another controversial method, developed by Joseph Greenberg , is mass comparison . The method, which disavows any ability to date developments, aims simply to show which languages are more and less close to each other. Greenberg suggested that

675-717: The IPA : (a) the same way as ejectives , with an apostrophe; or (b) with the under-tilde for creaky voice. For example, the Yapese word for "sick" with a glottalized m could be transcribed as either [mʼaar] or [m̰aar] . (In some typefaces, the apostrophe will occur above the m.) Glottalization varies along three parameters, all of which are continuums. The degree of glottalization varies from none ( modal voice , [d] ) through stiff voice ( [d̬] ) and creaky voice ( [d̰] ) to full glottal closure (glottal reinforcement or glottal replacement, described below). The timing also varies, from

720-737: The Turanian or Ural–Altaic language group, which relates Sami and other languages to the Mongolian language , was used to justify racism towards the Sami in particular. There are also strong, albeit areal not genetic , similarities between the Uralic and Altaic languages which provided an innocent basis for this theory. In 1930s Turkey , some promoted the Sun Language Theory , one that showed that Turkic languages were close to

765-405: The glottis ; another way to describe this phenomenon is to say that a glottal stop is made simultaneously with another consonant . In certain cases, the glottal stop can even wholly replace the voiceless consonant. The term 'glottalized' is also used for ejective and implosive consonants; see glottalic consonant for examples. There are two other ways to represent glottalization of sonorants in

810-433: The 100 items produced just as good if not slightly better classificatory results than the whole list. So subsequently word lists gathered contain only 40 items (or less, when attestations for some are lacking). In papers published since 2008, ASJP has employed a similarity judgment program based on Levenshtein distance (LD). This approach was found to produce better classificatory results measured against expert opinion than

855-528: The Bantu languages of Africa are descended from Latin, coining the French linguistic term nitale in doing so. Just as Egyptian is related to Brabantic, following Becanus in his Hieroglyphica , still using comparative methods. The first practitioners of comparative linguistics were not universally acclaimed: upon reading Becanus' book, Scaliger wrote, "never did I read greater nonsense", and Leibniz coined

900-606: The Celtic language is the oldest, and the mother of all others. In 1759, Joseph de Guignes theorized ( Mémoire dans lequel on prouve que les Chinois sont une colonie égyptienne ) that the Chinese and Egyptians were related, the former being a colony of the latter. In 1885, Edward Tregear ( The Aryan Maori ) compared the Maori and "Aryan" languages. Jean Prat  [ fr ] , in his 1941 Les langues nitales , claimed that

945-420: The French word logement, meaning 'dwelling,' originated from the word l'eau, which means 'water.' Glottalized Glottalization is the complete or partial closure of the glottis during the articulation of another sound. Glottalization of vowels and other sonorants is most often realized as creaky voice (partial closure). Glottalization of obstruent consonants usually involves complete closure of

SECTION 20

#1732771931511

990-542: The comparative method has not generally produced results that have met with wide acceptance. The method has also not been very good at unambiguously identifying sub-families; thus, different scholars have produced conflicting results, for example in Indo-European. A number of methods based on statistical analysis of vocabulary have been developed to try and overcome this limitation, such as lexicostatistics and mass comparison . The former uses lexical cognates like

1035-413: The comparative method, which was developed over many years, culminating in the nineteenth century. This uses a long word list and detailed study. However, it has been criticized for example as subjective, informal, and lacking testability. The comparative method uses information from two or more languages and allows reconstruction of the ancestral language. The method of internal reconstruction uses only

1080-401: The comparative method, while the latter uses only lexical similarity . The theoretical basis of such methods is that vocabulary items can be matched without a detailed language reconstruction and that comparing enough vocabulary items will negate individual inaccuracies; thus, they can be used to determine relatedness but not to determine the proto-language. The earliest method of this type was

1125-475: The comparison may be more restricted, e.g. just to the lexicon. In some methods it may be possible to reconstruct an earlier proto-language . Although the proto-languages reconstructed by the comparative method are hypothetical, a reconstruction may have predictive power. The most notable example of this is Ferdinand de Saussure 's proposal that the Indo-European consonant system contained laryngeals ,

1170-402: The database includes pidgins , creoles , mixed languages , and constructed languages . Words of the database are transcribed into a simplified standard orthography ( ASJPcode ). The database has been used to estimate dates at which language families have diverged into daughter languages by a method related to but still different from glottochronology , to determine the homeland ( Urheimat ) of

1215-530: The features of a proto-language, apart from the fact of the existence of shared items of the compared vocabulary. These approaches have been challenged for their methodological problems, since without a reconstruction or at least a detailed list of phonological correspondences there can be no demonstration that two words in different languages are cognate. There are other branches of linguistics that involve comparing languages, which are not, however, part of comparative linguistics : Comparative linguistics includes

1260-457: The following symbols to encode phonemes : p b f v m w 8 t d s z c n r l S Z C j T 5 y k g x N q X h 7 L 4 G ! i e E 3 a u o They represent 7 vowels and 34 consonants, all found on the standard QWERTY keyboard. A ~ mark follows two consonants so that they are considered to be in the same position. Thus, kʷat becomes kw~at . Syllables like kat , wat , kaw and kwi are considered lexically similar to kw~at . Similarly,

1305-627: The former and distanced based methods are similar to the latter (see Quantitative comparative linguistics ). The characters used can be morphological or grammatical as well as lexical. Since the mid-1990s these more sophisticated tree- and network-based phylogenetic methods have been used to investigate the relationships between languages and to determine approximate dates for proto-languages. These are considered by many to show promise but are not wholly accepted by traditionalists. However, they are not intended to replace older methods but to supplement them. Such statistical methods cannot be used to derive

1350-440: The glottal stop is an allophone of /p/ , /t/ and /k/ word-finally, and when followed by an unstressed vowel (including syllabic /l/ /m/ and /n/ ) in a post-stress syllable. 'Water' can be pronounced [ˈwɔːʔə] – the glottal stop has superseded the 't' sound. Other examples include "city" [ˈsɪʔi] , "bottle" [ˈbɒʔo] , "Britain" [ˈbɹɪʔən] , "seniority" [sɪiniˈɒɹəʔi] . In some consonant clusters , glottal replacement of /t/

1395-464: The information. An outgrowth of lexicostatistics is glottochronology , initially developed in the 1950s, which proposed a mathematical formula for establishing the date when two languages separated, based on percentage of a core vocabulary of culturally independent words. In its simplest form a constant rate of change is assumed, though later versions allow variance but still fail to achieve reliability. Glottochronology has met with mounting scepticism, and

Automated Similarity Judgment Program - Misplaced Pages Continue

1440-412: The latter generally consists of creaky phonation, there is some allophony involved. In pre-final contexts, a variation occurs (especially before voiced consonants) ranging from creaky phonation throughout the vowel to a sequence of a vowel, glottal stop , and a slightly rearticulated vowel: /maˀˈnʲoʐ/ ('deer') → [maʔa̯ˈnʲoʂ] . When a phoneme is accompanied (either sequentially or simultaneously) by

1485-566: The method applied is not well-defined: the criterion of similarity is subjective and thus not subject to verification or falsification , which is contrary to the principles of the scientific method. Second, the large size of all languages' vocabulary and a relatively limited inventory of articulated sounds used by most languages makes it easy to find coincidentally similar words between languages. There are sometimes political or religious reasons for associating languages in ways that some linguists would dispute. For example, it has been suggested that

1530-477: The method is useful for preliminary grouping of languages known to be related as a first step toward more in-depth comparative analysis. However, since mass comparison eschews the establishment of regular changes, it is flatly rejected by the majority of historical linguists. Recently, computerised statistical hypothesis testing methods have been developed which are related to both the comparative method and lexicostatistics . Character based methods are similar to

1575-490: The method used initially. LD is defined as the minimum number of successive changes necessary to convert one word into another, where each change is the insertion, deletion, or substitution of a symbol. Within the Levenshtein approach, differences in word length can be corrected for by dividing LD by the number of symbols of the longer of the two compared words. This produces normalized LD (LDN). An LDN divided (LDND) between

1620-579: The mid-1900s that Basque is clearly related to the extinct Pictish and Etruscan languages, in attempt to show that Basque was a remnant of an " Old European culture ". In the Dissertatio de origine gentium Americanarum (1625), the Dutch lawyer Hugo Grotius "proves" that the American Indians ( Mohawks ) speak a language ( lingua Maquaasiorum ) derived from Scandinavian languages (Grotius

1665-559: The original language. Some believers in Abrahamic religions try to derive their native languages from Classical Hebrew , as Herbert W. Armstrong , a proponent of British Israelism , who said that the word British comes from Hebrew brit meaning ' covenant ' and ish meaning 'man', supposedly proving that the British people are the 'covenant people' of God. And Lithuanian -American archaeologist Marija Gimbutas argued during

1710-410: The project in other ways. The main driving force behind the founding of the consortium was Cecil H. Brown . Søren Wichmann is daily curator of the project. A third central member of the consortium is Eric W. Holman, who has created most of the software used in the project. While word lists used were originally based on the 100-item Swadesh list , it was statistically determined that a subset of 40 of

1755-497: The study of the historical relationships of languages using the comparative method to search for regular (i.e., recurring) correspondences between the languages' phonology, grammar, and core vocabulary, and through hypothesis testing, which involves examining specific patterns of similarity and difference across languages; some persons with little or no specialization in the field sometimes attempt to establish historical associations between languages by noting similarities between them, in

1800-528: The term goropism (from Goropius ) to designate a far-sought, ridiculous etymology. There have also been assertions that humans are descended from non-primate animals, with the use of the voice being the primary basis for comparison. Jean-Pierre Brisset (in La Grande Nouvelle, around 1900) believed and claimed that humans evolved from frogs through linguistic connections, arguing that the croaking of frogs resembles spoken French. He suggested that

1845-459: The two languages is calculated by dividing the average LDN for all the word pairs involving the same meaning by the average LDN for all the word pairs involving different meanings. This second normalization is intended to correct for chance similarity. The ASJP uses the following 40-word list. It is similar to the Swadesh–Yakhontov list , but has some differences. ASJP version from 2016 uses

Automated Similarity Judgment Program - Misplaced Pages Continue

1890-455: The two languages was calculated as a percentage of the total number of words compared that were judged as similar. This method was applied to 100-item word lists for 250 languages from language families including Austroasiatic , Indo-European , Mayan , and Muskogean . The ASJP Consortium, founded around 2008, came to involve around 25 professional linguists and other interested parties working as volunteer transcribers and/or extending aid to

1935-496: Was on Sweden's payroll), supporting Swedish colonial pretensions in America. The Dutch doctor Johannes Goropius Becanus , in his Origines Antverpiana (1580) admits Quis est enim qui non amet patrium sermonem ("Who does not love his fathers' language?"), whilst asserting that Hebrew is derived from Dutch. The Frenchman Éloi Johanneau claimed in 1818 ( Mélanges d'origines étymologiques et de questions grammaticales ) that

1980-419: Was originally developed as a means for objectively evaluating the similarity of words with the same meaning from different languages, with the ultimate goal of classifying languages computationally, based on the lexical similarities observed. In the first ASJP paper two semantically identical words from compared languages were judged similar if they showed at least two identical sound segments. Similarity between

2025-561: Was replaced by a glottal stop, even in word-initial position, except when it followed *ŋ ( *kayu → Gorontalo ayu , *konuku → onu'u ). In Hawaiian , the glottal stop is reconstructed to have come from other Proto-Polynesian consonants. The following table displays the shift /k/ → /ʔ/ as well as the shift /t/ → /k/ . Glottal replacement is not purely a feature of consonants. Yanesha' has three vowel qualities ( /a/ , /e/ , and /o/ ) that have phonemic contrasts between short, long, and "laryngeal" or glottalized forms. While

#510489