In corpus linguistics , a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology , a collocation is a type of compositional phraseme , meaning that it can be understood from the words that make it up. This contrasts with an idiom , where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.
41-692: The Gemeinsame Normdatei (translated as Integrated Authority File ) or GND is an international authority file for the organisation of personal names, subject headings and corporate bodies from catalogues . It is used mainly for documentation in libraries and increasingly also by archives and museums . The GND is managed by the German National Library (German: Deutsche Nationalbibliothek ; DNB) in cooperation with various regional library networks in German-speaking Europe and other partners. The GND falls under
82-448: A key word in context ( KWIC ) and identify the words immediately surrounding them. This gives an idea of the way words are used. The processing of collocations involves a number of parameters, the most important of which is the measure of association , which evaluates whether the co-occurrence is purely by chance or statistically significant . Due to the non-random nature of language, most collocations are classed as significant, and
123-436: A syntactic relation (such as verb–object : make and decision ), lexical relation (such as antonymy ), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation an interesting area for language teaching. Corpus linguists specify
164-497: A brief descriptive epithet. When catalogers come across different subjects with similar or identical headings, they can disambiguate them using authority control. A customary way of enforcing authority control in a bibliographic catalog is to set up a separate index of authority records, which relates to and governs the headings used in the main catalog. This separate index is often referred to as an "authority file". It contains an indexable record of all decisions made by catalogers in
205-455: A continuum: In 1933, Harold Palmer 's Second Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a foreign language . Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of monolingual learner's dictionaries . As these dictionaries became "less word-centred and more phrase-centred", more attention
246-461: A corpus with size N {\displaystyle N} , and let P ( w 2 ) = # w 2 N {\displaystyle P(w_{2})={\frac {\#w_{2}}{N}}} be the unconditional probability of occurrence of w 2 {\displaystyle w_{2}} in the corpus. The t-score for the bigram w 1 w 2 {\displaystyle w_{1}w_{2}}
287-443: A correlation between a lexeme and a lexical-grammatical pattern, or as a relation between a base and its collocative partners; and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form. These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in
328-561: A database and called an authority file , and maintaining and updating these files as well as "logical linkages" to other files within them is the work of librarians and other information catalogers. Accordingly, authority control is an example of controlled vocabulary and of bibliographic control . While in theory any piece of information is amenable to authority control such as personal and corporate names, uniform titles , series names, and subjects, library catalogers typically focus on author names and titles of works. Traditionally, one of
369-401: A given library (or—as is increasingly the case—cataloging consortium), which catalogers consult when making, or revising, decisions about headings. As a result, the records contain documentation about sources used to establish a particular preferred heading, and may contain information discovered while researching the heading which may be useful. While authority files provide information about
410-605: A given work under one unique heading even when such versions are issued under different titles. With authority control, one unique preferred name represents all variations and will include different variations, spellings and misspellings, uppercase versus lowercase variants, differing dates, and so forth. For example, in Misplaced Pages, the first wife of Charles III is described by an article Diana, Princess of Wales as well as numerous other descriptors, e.g. Princess Diana , but both Princess Diana and Diana, Princess of Wales describe
451-602: A number of specialized dictionaries devoted to describing the frequent collocations in a language. These include (for Spanish) Redes: Diccionario combinatorio del español contemporaneo (2004), (for French) Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) the LTP Dictionary of Selected Collocations (1997) and the Macmillan Collocations Dictionary (2010). Student's t -test can be used to determine whether
SECTION 10
#1732787575944492-469: A particular subject, their primary function is not to provide information but to organize it. They contain enough information to establish that a given author or title is unique, but that is all; irrelevant but interesting information is generally excluded. Although practices vary internationally, authority records in the English-speaking world generally contain the following information: Since
533-476: A user queries the catalog under one of these variant forms of the author's name, he or she would receive the response: "See O'Brien, Flann, 1911–1966." There is an additional spelling variant of the Gopaleen name: "Na gCopaleen, Myles, 1911–1966" has an extra C inserted because the author also employed the non-anglicized Irish spelling of his pen-name, in which the capitalized C shows the correct root word while
574-462: A variety of legal names in the course of their lifetime, as well as a variety of nicknames, pen names, stage names or other alternative names. It may be particularly difficult to choose a single authorized heading for individuals whose various names have controversial political or social connotations, when the choice of authorized heading may be seen as endorsement of the associated political or social ideology. An alternative to using authorized headings
615-404: Is VIAF ID: 107032638 — that is, a common number representing all of these variations. The English Misplaced Pages prefers the term "Diana, Princess of Wales", but at the bottom of the article about her, there are links to various international cataloging efforts for reference purposes. Sometimes two different authors have been published under the same name. This can happen if there is a title which
656-432: Is a computational technique that finds collocations in a document or corpus, using various computational linguistics elements resembling data mining . Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms as crystal clear , middle management , nuclear family , and cosmetic surgery are examples of collocated pairs of words. Collocations can be in
697-422: Is calculated as: where x ¯ = # w i w j N {\displaystyle {\bar {x}}={\frac {\#w_{i}w_{j}}{N}}} is the sample mean of the occurrence of w 1 w 2 {\displaystyle w_{1}w_{2}} , # w 1 w 2 {\displaystyle \#w_{1}w_{2}}
738-409: Is identical to another title or to a collective uniform title. This, too, can cause confusion. Different authors can be distinguished correctly from each other by, for example, adding a middle initial to one of the names; in addition, other information can be added to one entry to clarify the subject, such as birth year, death year, range of active years such as 1918–1965 when the person flourished , or
779-587: Is not about creating a perfect seamless system but rather it is an ongoing effort to keep up with these changes and try to bring "structure and order" to the task of helping users find information. Sometimes within a catalog, there are diverse names or spellings for only one person or subject. This variation may cause researchers to overlook relevant information. Authority control is used by catalogers to collocate materials that logically belong together but that present themselves differently. Records are used to establish uniform titles that collocate all versions of
820-465: Is the idea of access control , where various forms of a name are related without the endorsement of one particular form. Before the advent of digital online public access catalogs and the Internet, individual cataloging departments within each library generally carried out creating and maintaining a library's authority files. Naturally, there was a considerable difference in the authority files of
861-556: Is the number of occurrences of w 1 w 2 {\displaystyle w_{1}w_{2}} , μ = P ( w i ) P ( w j ) {\displaystyle \mu =P(w_{i})P(w_{j})} is the probability of w 1 w 2 {\displaystyle w_{1}w_{2}} under the null-hypothesis that w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} appear independently in
SECTION 20
#1732787575944902-585: The Creative Commons Zero (CC0) licence. The GND specification provides a hierarchy of high-level entities and sub-classes, useful in library classification , and an approach to unambiguous identification of single elements. It also comprises an ontology intended for knowledge representation in the semantic web , available in the RDF format. The GND became operational in April 2012 and integrates
943-465: The Library of Congress chose as authoritative. In theory, every record in the catalog that represents a work by this author should have this form of the name as its author heading. What follows immediately below the heading beginning with Na Gopaleen, Myles, 1911–1966 are the see references. These forms of the author's name will appear in the catalog, but only as transcriptions and not as headings. If
984-531: The Virtual International Authority File (VIAF) project. There are six main types of GND entities: Authority file In information science , authority control is a process that organizes information, for example in library catalogs , by using a single, distinct spelling of a name (heading) or an (generally alphanumeric ) identifier for each topic or concept. The word authority in authority control derives from
1025-527: The Irish writer Brian O'Nolan , who lived from 1911 to 1966, wrote under many pen names such as Flann O'Brien and Myles na Gopaleen. Catalogers at the United States Library of Congress chose one form—"O'Brien, Flann, 1911–1966"—as the official heading. The example contains all three elements of a valid authority record: the first heading O'Brien, Flann, 1911–1966 is the form of the name that
1066-611: The United States Library of Congress . The idea is to create a single worldwide virtual authority file. For example, the ID for Princess Diana in the GND is 118525123 (preferred name: Diana < Wales, Prinzessin> ) while the United States Library of Congress uses the term Diana, Princess of Wales, 1961–1997 ; other authority files have other choices. The Virtual International Authority File choice for all of these variations
1107-628: The advent of automated database technologies, catalogers began to establish cooperative consortia, such as OCLC and RLIN in the United States , in which cataloging departments from libraries all over the world contributed their records to, and took their records from, a shared database. This development prompted the need for national standards for authority work. In the United States, the primary organization for maintaining cataloging standards with respect to authority work operates under
1148-900: The aegis of the Library of Congress Program for Cooperative Cataloging . It is known as the Name Authority Cooperative Program , or NACO Authority. There are various standards using different acronyms. Standards for authority metadata : Standards for object identification, controlled by an identification-authority : Standards for identified-object metadata (examples): vCard , Dublin Core , etc. Collocation There are about seven main types of collocations: adjective + noun, noun + noun (such as collective nouns ), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase ( phrasal verbs ), and verb + adverb. Collocation extraction
1189-466: The association scores are simply used to rank the results. Commonly used measures of association include mutual information , t scores , and log-likelihood . Rather than select a single definition, Gledhill proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates; construction, which sees collocation either as
1230-559: The confusion. One international effort to prevent such confusion is the Virtual International Authority File which is a collaborative attempt to provide a single heading for a particular subject. It is a way to standardize information from different authority files around the world such as the Integrated Authority File (GND) maintained and used cooperatively by many libraries in German-speaking countries and
1271-410: The content of the following authority files, which have since been discontinued: It is referred to by identifiers named GND-ID . At the time of its introduction on 5 April 2012, the GND held 9,493,860 files, including 2,650,000 personalised names. In July 2020 non-individualized files were deleted. In 2022, the GND held 9,370,736 files, including 5,937,788 personalised names. The GND participates in
Integrated Authority File - Misplaced Pages Continue
1312-416: The different libraries. For the early part of library history, it was generally accepted that, as long as a library's catalog was internally consistent, the differences between catalogs in different libraries did not matter greatly. As libraries became more attuned to the needs of researchers and began interacting more with other libraries, the value of standard cataloging practices came to be recognized. With
1353-631: The headings function as access points, making sure that they are distinct and not in conflict with existing entries is important. For example, the English novelist William Collins (1824–89), whose works include the Moonstone and The Woman in White is better known as Wilkie Collins. Cataloguers have to decide which name the public would most likely look under, and whether to use a see also reference to link alternative forms of an individual's name. For example,
1394-491: The idea that the names of people, places, things, and concepts are authorized, i.e., they are established in one particular form. These one-of-a-kind headings or identifiers are applied consistently throughout catalogs which make use of the respective authority file, and are applied for other methods of organizing data such as linkages and cross references . Each controlled entry is described in an authority record in terms of its scope and usage, and this organization helps
1435-428: The justification for this particular form of the name: it appeared in this form on the 1939 edition of the author's novel At Swim-Two-Birds , whereas the author's other noms de plume appeared on later publications. The act of choosing a single authorized heading to represent all forms of a name is quite often a difficult and complex task, considering that any given individual may have legally changed their name or used
1476-557: The library staff maintain the catalog and make it user-friendly for researchers. Catalogers assign each subject—such as author, topic, series, or corporation—a particular unique identifier or heading term which is then used consistently, uniquely, and unambiguously for all references to that same subject, which removes variations from different spellings, transliterations , pen names , or aliases . The unique header can guide users to all relevant information including related or collocated subjects. Authority records can be combined into
1517-507: The most commonly used authority files globally are the subject headings from the Library of Congress . More recently, links to articles and categories of Misplaced Pages emerged to function as an authority file due to the popularity of the encyclopedia, where each article is a notable topic or concept similar to other authority files. As time passes, information changes, prompting needs for reorganization. According to one view, authority control
1558-451: The occurrence of a collocation in a corpus is statistically significant. For a bigram w 1 w 2 {\displaystyle w_{1}w_{2}} , let P ( w 1 ) = # w 1 N {\displaystyle P(w_{1})={\frac {\#w_{1}}{N}}} be the unconditional probability of occurrence of w 1 {\displaystyle w_{1}} in
1599-492: The preceding g indicates its pronunciation in context. So if a library user comes across this spelling variant, he or she will be led to the same author regardless. See also references, which point from one authorized heading to another authorized heading, are exceedingly rare for personal name authority records, although they often appear in name authority records for corporate bodies. The final four entries in this record beginning with His At Swim-Two-Birds ... 1939. constitute
1640-673: The same person so they all redirect to the same main article; in general, all authority records choose one title as the preferred one for consistency. In an online library catalog, various entries might look like the following: These terms describe the same person. Accordingly, authority control reduces these entries to one unique entry or officially authorized heading, sometimes termed an access point : Diana, Princess of Wales, 1961–1997. Generally, there are different authority file headings and identifiers used by different libraries in different countries, possibly inviting confusion, but there are different approaches internationally to try to lessen
1681-626: Was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text corpora and intelligent corpus-querying software , making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the Macmillan English Dictionary and the Longman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations. There are also