Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings , thesauri , taxonomies and other knowledge organization systems . Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.
50-722: Medical Subject Headings ( MeSH ) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences . It serves as a thesaurus that facilitates searching. Created and updated by the United States National Library of Medicine (NLM), it is used by the MEDLINE / PubMed article database and by NLM's catalog of book holdings. MeSH is also used by ClinicalTrials.gov registry to classify which diseases are studied by trials registered in ClinicalTrials. MeSH
100-451: A Data Bank and put into In) with the hope that this would increase use by industry. After a second draft guidance was released in June 2001, a final guidance was issued on March 18, 2002 titled "Guidance for Industry Information Program on Clinical Trials for Serious or Life-Threatening Diseases and Conditions". The Best Pharmaceuticals for Children Act of 2004 (Public Law 107-109) amended
150-613: A journal article with an PubMed identification number (PMID). Such link is created either by the author of the journal article by mentioning the trial ID in the abstract (abstract trial-article link) or by the trial record manager when the registry record is updated with a PMID of an article that reports trial results (registry trial-article link). A 2013 study analyzing 8907 interventional trials registered in ClinicalTrials.gov found that 23.2% of trials had abstract-linked result articles and 7.3% of trials had registry-linked articles. 2.7% of trials had both types of links. Most trials are linked to
200-665: A person including, but not limited to, name, honorific prefix, affiliation, email address, and homepage, or the Person vocabulary of Schema.org . Similarly, a book can be described using the Book vocabulary of Schema.org and general publication terms from the Dublin Core vocabulary, an event with the Event vocabulary of Schema.org , and so on. To use machine-readable terms from any controlled vocabulary, web designers can choose from
250-476: A public library. In large organizations, controlled vocabularies may be introduced to improve technical communication . The use of controlled vocabulary ensures that everyone is using the same word to mean the same thing. This consistency of terms is one of the most important concepts in technical writing and knowledge management , where effort is expended to use the same word throughout a document or organization instead of slightly different ones to refer to
300-698: A result of pressure from HIV-infected men in the gay community, who demanded better access to clinical trials, the U.S. Congress passed the Health Omnibus Programs Extension Act of 1988 (Public Law 100-607) which mandated the development of a database of AIDS Clinical Trials Information Services (ACTIS). This effort served as an example of what might be done to improve public access to clinical trials, and motivated other disease-related interest groups to push for something similar for all diseases. The Food and Drug Administration Modernization Act of 1997 (Public Act 105-115) amended
350-454: A unique alphanumerical ID that will not change. Most subject headings come with a short description or definition. See the MeSH description for diabetes type 2 as an example. The explanatory text is written by the MeSH team based on their standard sources if not otherwise stated. References are mostly encyclopaedias and standard textbooks of the subject areas. References for specific statements in
400-517: A variety of annotation formats, including RDFa, HTML5 Microdata , or JSON-LD in the markup, or RDF serializations (RDF/XML, Turtle, N3, TriG, TriX) in external files. ClinicalTrials.gov ClinicalTrials.gov is a registry of clinical trials . It is run by the United States National Library of Medicine (NLM) at the National Institutes of Health , and holds registrations from over 444,000 trials from 221 countries. As
450-418: Is a carefully selected list of words and phrases , which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of homographs , synonyms and polysemes by a bijection between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where
500-560: Is designed on faceted classification principles. Controlled vocabularies of the Semantic Web define the concepts and relationships (terms) used to describe a field of interest or area of concern. For instance, to declare a person in a machine-readable format, a vocabulary is needed that has the formal definition of "Person", such as the Friend of a Friend ( FOAF ) vocabulary, which has a Person class that defines typical properties of
550-506: Is divided into four types of terms. The main ones are the "headings" (also known as MeSH headings or descriptors ), which describe the subject of each article (e.g., "Body Weight", "Brain Edema" or "Critical Care Nursing"). Most of these are accompanied by a short description or definition, links to related descriptors, and a list of synonyms or very similar terms (known as entry terms ). MeSH contains approximately 30,000 entries (as of 2022) and
SECTION 10
#1732776383308600-435: Is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main focus. But it turns out that for the searcher that article is relevant and hence recall fails. A free text search would automatically pick up that article regardless. On the other hand, free text searches have high exhaustivity (every word
650-551: Is not neutral, and the indexer must carefully consider the ethics of their word choices. For example, traditionally colonialist terms have often been the preferred terms in chosen vocabularies when discussing First Nations issues, which has caused controversy. Controlled vocabularies, such as the Library of Congress Subject Headings , are an essential component of bibliography , the study and classification of books. They were initially developed in library and information science . In
700-410: Is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination. Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of
750-547: Is the name given to a number of different team sports . Worldwide the most popular of these team sports is association football , which also happens to be called soccer in several countries. The word football is also applied to rugby football ( rugby union and rugby league ), American football , Australian rules football , Gaelic football , and Canadian football . A search for football therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by tagging
800-436: Is updated annually to reflect changes in medicine and medical terminology. MeSH terms are arranged in alphabetic order and in a hierarchical structure by subject categories with more specific terms arranged beneath broader terms. When we search for a MeSH term, the most specific MeSH terms are automatically included in the search. This is known as the extended search or explode of that MeSH term. This additional information and
850-425: Is usable for indexing web pages is PSH . It is unlikely that a single metadata scheme will ever succeed in describing the content of the entire Web. To create a Semantic Web, it may be necessary to draw from two or more metadata systems to describe a Web page's contents. The eXchangeable Faceted Metadata Language (XFML) is designed to enable controlled vocabulary creators to publish and share metadata systems. XFML
900-536: The ERIC Thesaurus. When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as tags , to aid in
950-609: The 1950s, government agencies began to develop controlled vocabularies for the burgeoning journal literature in specialized fields; an example is the Medical Subject Headings (MeSH) developed by the U.S. National Library of Medicine . Subsequently, for-profit firms (called Abstracting and indexing services) emerged to index the fast-growing literature in every field of knowledge. In the 1960s, an online bibliographic database industry developed based on dialup X.25 networking. These services were seldom made available to
1000-478: The 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search". Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list. These irrelevant items ( false positives ) are often caused by the inherent ambiguity of natural language . Take the English word football for example. Football
1050-751: The Food, Drug and Cosmetic Act and the Public Health Service Act to require that the NIH create and operate a public information resource, which came to be called ClinicalTrials.gov, tracking drug efficacy studies resulting from approved Investigational New Drug (IND) applications (FDA Regulations 21 CFR Parts 312 and 812). With the primary purpose of improving access of the public to clinical trials where individuals with serious diseases and conditions might find experimental treatments, this law required information about: The National Library of Medicine in
SECTION 20
#17327763833081100-579: The National Institutes of Health made ClinicalTrials.gov available to the public via the internet on February 29, 2000. In this initial release, ClinicalTrials.gov primarily included information about NIH-sponsored trials, omitting the majority of clinical trials being performed by private industry. On March 29, 2000 the FDA issued a Draft Guidance called Information Program on Clinical Trials for Serious or Life-Threatening Diseases: Establishment of
1150-477: The Public Health Service Act to require that additional information be included in ClinicalTrials.gov. As the result of toxicity tracking concerns raised following retraction of several drugs from the prescription market, ClinicalTrials.gov was further reinforced by the Food and Drug Administration Amendments Act of 2007 (U.S. Public Law 110-85) which mandated the expansion of ClinicalTrials.gov for better tracking of
1200-509: The article's major topics. When performing a MEDLINE search via PubMed, entry terms are automatically translated into (i.e., mapped to) the corresponding descriptors with a good degree of reliability; it is recommended to check the 'Details tab' in PubMed to see how a search formulation was translated. By default, a search for a descriptor will include all the descriptors in the hierarchy below the given one. PubMed does not apply automatic mapping of
1250-511: The basic results of clinical trials, requiring: In a 2009 meeting of the National Institutes of Health speakers said that one of the goals was to have more clearly defined and consistent standards for reporting. As of March 2015, the NIH was still considering the details of this rule change. A study of trials conducted between 2008 and 2012 found that about half of those required to be reported had not been. A 2014 study of pre-2009 trials found that many had serious discrepancies between what
1300-402: The content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata . There are three main types of indexing languages. When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of
1350-407: The controlled vocabulary scheme to make best use of the system. But as already mentioned, the control of synonyms, homographs can help increase precision. Numerous methodologies have been developed to assist in the creation of controlled vocabularies, including faceted classification , which enables a given data record or document to be described in multiple ways. Word choice in chosen vocabularies
1400-399: The correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term. A controlled vocabulary search may lead to unsatisfactory recall , in that it will fail to retrieve some documents that are actually relevant to the search question. This is particularly problematic when the search question involves terms that are sufficiently tangential to
1450-663: The descriptions are not given; instead, readers are referred to the bibliography. In addition to the descriptor hierarchy, MeSH contains a small number of standard qualifiers (also known as subheadings ), which can be added to descriptors to narrow down the topic. For example, "Measles" is a descriptor and "epidemiology" is a qualifier; "Measles/epidemiology" describes the subheading of epidemiological articles about Measles. The "epidemiology" qualifier can be added to all other disease descriptors. Not all descriptor/qualifier combinations are allowed since some of them may be meaningless. In all there are 83 different qualifiers. In addition to
1500-404: The descriptor "Digestive System Neoplasms" has the tree numbers C06.301 and C04.588.274; C stands for Diseases, C06 for Digestive System Diseases and C06.301 for Digestive System Neoplasms; C04 for Neoplasms, C04.588 for Neoplasms By Site, and C04.588.274 also for Digestive System Neoplasms. The tree numbers of a given descriptor are subject to change as MeSH is updated. Every descriptor also carries
1550-526: The descriptors, MeSH also contains some 318,000 supplementary concept records . These do not belong to the controlled vocabulary as such; instead they enlarge the thesaurus and contain links to the closest fitting descriptor to be used in a MEDLINE search. Many of these records describe chemical substances. In MEDLINE/PubMed, every journal article is indexed with about 10–15 subject headings, subheadings and supplementary concept records, with some of them designated as major and marked with an asterisk, indicating
Medical Subject Headings - Misplaced Pages Continue
1600-466: The documents in such a way that the ambiguities are eliminated. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once
1650-436: The game pool to ensure that each preferred term or heading refers to only one concept. There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri . While the differences between the two are diminishing, there are still some minor differences. The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in
1700-470: The hierarchical structure (see below) make the MeSH essentially a thesaurus , rather than a plain subject headings list. The second type of term, MeSH subheadings or qualifiers (see below), can be used with MeSH terms to more completely describe a particular aspect of a subject, such as adverse, diagnostic or genetic effects. For example, the drug therapy of asthma is displayed as asthma/drug therapy. The remaining two types of term are those that describe
1750-440: The principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs with qualifiers. For example, the term pool has to be qualified to refer to either swimming pool or
1800-657: The public because they were difficult to use; specialist librarians called search intermediaries handled the searching job. In the 1980s, the first full text databases appeared; these databases contain the full text of the index articles as well as the bibliographic information. Online bibliographic databases have migrated to the Internet and are now publicly available; however, most are proprietary and can be expensive to use. Students enrolled in colleges and universities may be able to access some of these services without charge; some of these services may be accessible without charge at
1850-564: The same concept can be given different names and ensure consistency. For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms ( cockroach versus Periplaneta americana ), and choices between synonyms ( automobile versus car ), among other difficult issues. Choices of preferred terms are based on
1900-487: The same thing. Web searching could be dramatically improved by the development of a controlled vocabulary for describing Web pages; the use of such a vocabulary could culminate in a Semantic Web , in which the content of Web pages is described using a machine-readable metadata scheme. One of the first proposals for such a scheme is the Dublin Core Initiative. An example of a controlled vocabulary which
1950-400: The subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer. Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity
2000-482: The subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the Library of Congress system , Medical Subject Headings (MeSH) created by the United States National Library of Medicine , and Sears . Well known thesauri include the Art and Architecture Thesaurus and
2050-408: The term in the following circumstances: by writing the quoted phrase (e.g. "kidney allograft"), when truncated on the asterisk (e.g. kidney allograft * ), and when looking with field labels (e.g. Cancer [ti] ). At ClinicalTrials.gov , each trial has keywords that describe the trial. The ClinicalTrials.gov team assigns each trial two sets of MeSH terms. One set is for the conditions studied by
Medical Subject Headings - Misplaced Pages Continue
2100-413: The text itself. Indexers trying to choose the appropriate index terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words. The use of controlled vocabularies can be costly compared to free text searches because human experts or expensive automated systems are necessary to index each entry. Furthermore, the user has to be familiar with
2150-420: The trial and the other for the set of interventions used in the trial. The XML file that can be downloaded for each trial contains these MeSH keywords. The XML file also has a comment that says: "the assignment of MeSH keywords is done by imperfect algorithm". The top-level categories in the MeSH descriptor hierarchy are: Controlled vocabulary In library and information science , controlled vocabulary
2200-484: The trial is no longer recruiting participants. Once all participants were recruited, the trial record may be updated to indicate that is closed to recruitment. Once all measurements are collected (the trial formally completes), the trial status is updated to 'complete'. If the trial terminates for some reason (e.g., lack of enrollment, evidence of initial adverse outcomes), the status may be updated to 'terminated'. Once final trial results are known or legal deadlines are met,
2250-495: The trial record manager may upload basic summary results to the registry either by filling a complex web-based form or submitting a compliant XML file. To search in ClinicalTrials.gov, users filter by All Studies, or select a certain phase in the study's recruitment. Then the user enters a search keyword or phrase into at least one of the provided search fields. Next, the user clicks the Search button, and results populate according to
2300-539: The type of material that the article represents ( publication types ), and supplementary concept records (SCR) which describes substances such as chemical products and drugs that are not included in the headings (see below as " Supplements "). The descriptors or subject headings are arranged in a hierarchy. A given descriptor may appear at several locations in the hierarchical tree. The tree locations carry systematic labels known as tree numbers , and consequently one descriptor can carry several tree numbers. For example,
2350-503: The user's input. The database for Aggregate Analysis of ClinicalTrials.gov (AACT) is a publicly available source based on the data in ClinicalTrials.gov. It was designed to facilitate aggregate analysis by normalizing some of the metadata across trials. PubMed is another resource managed by the National Library of Medicine . A trial with an NCT identification number that is registered in ClinicalTrials.gov can be linked to
2400-418: The work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document. In recent years free text search as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is indexed ). These methods have been compared in some studies, such as
2450-614: Was introduced in the 1960s, with the NLM's own index catalogue and the subject headings of the Quarterly Cumulative Index Medicus (1940 edition) as precursors. The yearly printed version of MeSH was discontinued in 2007; MeSH is now available only online. It can be browsed and downloaded free of charge through PubMed. Originally in English, MeSH has been translated into numerous other languages and allows retrieval of documents from different origins. MeSH vocabulary
2500-476: Was reported on clinicaltrials.gov versus the peer-reviewed journal articles reporting the same studies. The trial typically goes through stages of: initial registration, ongoing record updates, and basic summary result submission. Each trial record is administered by a trial record manager. A trial record manager typically provides initial trial registration prior to the study enrolling the first participant. This also facilitates informing potential participants that
#307692