In statistics , probability theory and information theory , pointwise mutual information ( PMI ), or point mutual information , is a measure of association . It compares the probability of two events occurring together to what this probability would be if the events were independent .
36-795: [REDACTED] Look up pmi in Wiktionary, the free dictionary. PMI may stand for: Computer science [ edit ] Pointwise mutual information , in statistics Privilege Management Infrastructure in cryptography Product and manufacturing information in CAD systems Companies [ edit ] Philip Morris International , American multinational tobacco company Picture Music International , former division of EMI that specialised in music video releases Precious Moments, Inc. , American catalog order company Precision Monolithics , former American semiconductor company PMI Group ,
72-583: A text corpus can be used to approximate the probabilities p ( x ) {\displaystyle p(x)} and p ( x , y ) {\displaystyle p(x,y)} respectively. The following table shows counts of pairs of words getting the most and the least PMI scores in the first 50 millions of words in Misplaced Pages (dump of October 2015) filtering by 1,000 or more co-occurrences. The frequency of each count can be obtained by dividing its value by 50,000,952. (Note: natural log
108-415: A PhD student at Brno University of Technology ) with co-authors applied a simple recurrent neural network with a single hidden layer to language modelling, and in the following years he went on to develop Word2vec . In the 2010s, representation learning and deep neural network -style (featuring many hidden layers) machine learning methods became widespread in natural language processing. That popularity
144-638: A holding company whose primary subsidiary is the PMI Mortgage Insurance Co Economics [ edit ] Passenger-mile , a unit of measurement Post-merger integration Private mortgage insurance or lenders mortgage insurance Purchasing Managers' Index , of business sentiment Organizations [ edit ] Plumbing Manufacturers International Project Management Institute Indonesian Red Cross Society (Indonesian: Palang Merah Indonesia Schools [ edit ] PMI Colleges , formerly
180-416: A pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence . Mathematically: (with the latter two expressions being equal to the first by Bayes' theorem ). The mutual information (MI) of the random variables X and Y
216-700: Is a subfield of computer science and especially artificial intelligence . It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval , knowledge representation and computational linguistics , a subfield of linguistics . Typically data is collected in text corpora , using either rule-based, statistical or neural-based approaches in machine learning and deep learning . Major tasks in natural language processing are speech recognition , text classification , natural-language understanding , and natural-language generation . Natural language processing has its roots in
252-448: Is different from Wikidata All article disambiguation pages All disambiguation pages Pointwise mutual information PMI (especially in its positive pointwise mutual information variant) has been described as "one of the most important concepts in NLP ", where it "draws on the intuition that the best way to weigh the association between two words is to ask how much more
288-510: Is expressed as the following: Like mutual information , point mutual information follows the chain rule , that is, This is proven through application of Bayes' theorem : PMI could be used in various disciplines e.g. in information theory, linguistics or chemistry (in profiling and analysis of chemical compounds). In computational linguistics , PMI has been used for finding collocations and associations between words. For instance, countings of occurrences and co-occurrences of words in
324-442: Is fixed but p ( x ) {\displaystyle p(x)} decreases. Here is an example to illustrate: Using this table we can marginalize to get the following additional table for the individual distributions: With this example, we can compute four values for pmi ( x ; y ) {\displaystyle \operatorname {pmi} (x;y)} . Using base-2 logarithms: (For reference,
360-493: Is given below. Based on long-standing trends in the field, it is possible to extrapolate future directions of NLP. As of 2020, three trends among the topics of the long-standing series of CoNLL Shared Tasks can be observed: Most higher-level NLP applications involve aspects that emulate intelligent behaviour and apparent comprehension of natural language. More broadly speaking, the technical operationalization of increasingly advanced aspects of cognitive behaviour represents one of
396-689: Is motivated by the observation that "negative PMI values (which imply things are co-occurring less often than we would expect by chance) tend to be unreliable unless our corpora are enormous" and also by a concern that "it's not clear whether it's even possible to evaluate such scores of 'unrelatedness' with human judgment". It also avoid having to deal with − ∞ {\displaystyle -\infty } values for events that never occur together ( p ( x , y ) = 0 {\displaystyle p(x,y)=0} ), by setting PPMI for these to 0. Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in
SECTION 10
#1732757461721432-496: Is non-negative. PMI maximizes when X and Y are perfectly associated (i.e. p ( x | y ) {\displaystyle p(x|y)} or p ( y | x ) = 1 {\displaystyle p(y|x)=1} ), yielding the following bounds: Finally, pmi ( x ; y ) {\displaystyle \operatorname {pmi} (x;y)} will increase if p ( x | y ) {\displaystyle p(x|y)}
468-754: Is the self-information , or − log 2 p ( x ) {\displaystyle -\log _{2}p(x)} . Several variations of PMI have been proposed, in particular to address what has been described as its "two main limitations": The positive pointwise mutual information (PPMI) measure is defined by setting negative values of PMI to zero: ppmi ( x ; y ) ≡ max ( log 2 p ( x , y ) p ( x ) p ( y ) , 0 ) {\displaystyle \operatorname {ppmi} (x;y)\equiv \max \left(\log _{2}{\frac {p(x,y)}{p(x)p(y)}},0\right)} This definition
504-519: Is the expected value of the PMI (over all possible outcomes). The measure is symmetric ( pmi ( x ; y ) = pmi ( y ; x ) {\displaystyle \operatorname {pmi} (x;y)=\operatorname {pmi} (y;x)} ). It can take positive or negative values, but is zero if X and Y are independent . Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI)
540-1194: Is the joint self-information − log 2 p ( x , y ) {\displaystyle -\log _{2}p(x,y)} . The PMI measure (for k=2, 3 etc.), which was introduced by Béatrice Daille around 1994, and as of 2011 was described as being "among the most widely used variants", is defined as pmi k ( x ; y ) ≡ log 2 p ( x , y ) k p ( x ) p ( y ) = pmi ( x ; y ) − ( − ( k − 1 ) ) log 2 p ( x , y ) ) {\displaystyle \operatorname {pmi} ^{k}(x;y)\equiv \log _{2}{\frac {p(x,y)^{k}}{p(x)p(y)}}=\operatorname {pmi} (x;y)-(-(k-1))\log _{2}p(x,y))} In particular, p m i 1 ( x ; y ) = p m i ( x ; y ) {\displaystyle pmi^{1}(x;y)=pmi(x;y)} . The additional factors of p ( x , y ) {\displaystyle p(x,y)} inside
576-456: Is used to calculate the PMI values in this example, instead of log base 2) Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. Natural language processing Natural language processing ( NLP )
612-409: Is well-summarized by John Searle 's Chinese room experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in
648-504: The ACL ). More recently, ideas of cognitive NLP have been revived as an approach to achieve explainability , e.g., under the notion of "cognitive AI". Likewise, ideas of cognitive NLP are inherent to neural models multimodal NLP (although rarely made explicit) and developments in artificial intelligence , specifically tools and technologies using large language model approaches and new directions in artificial general intelligence based on
684-863: The mutual information I ( X ; Y ) {\displaystyle \operatorname {I} (X;Y)} would then be 0.2141709.) Pointwise Mutual Information has many of the same relationships as the mutual information. In particular, pmi ( x ; y ) = h ( x ) + h ( y ) − h ( x , y ) = h ( x ) − h ( x ∣ y ) = h ( y ) − h ( y ∣ x ) {\displaystyle {\begin{aligned}\operatorname {pmi} (x;y)&=&h(x)+h(y)-h(x,y)\\&=&h(x)-h(x\mid y)\\&=&h(y)-h(y\mid x)\end{aligned}}} Where h ( x ) {\displaystyle h(x)}
720-486: The 1950s. Already in 1950, Alan Turing published an article titled " Computing Machinery and Intelligence " which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. The premise of symbolic NLP
756-1115: The Philippine Maritime Institute PMI Colleges Bohol , Tagbilaran City Pima Medical Institute , US Medicine [ edit ] The pulse at the point of maximum impulse (PMI) is the apex beat of the heart Post-mortem interval , the time since a death Technique [ edit ] Positive material identification of a metallic alloy Preventive maintenance inspection , USAF Other uses [ edit ] Palma de Mallorca Airport (IATA airport code PMI) Pointwise mutual information , measure in statistical probability theory US Presidential Management Internship, now Presidential Management Fellows Program President's Malaria Initiative , U.S. Government initiative to control and eliminate malaria See also [ edit ] [REDACTED] Search for "pmi" or "p-m-i" on Misplaced Pages. All pages with titles beginning with PMI All pages with titles containing PMI Topics referred to by
SECTION 20
#1732757461721792-460: The advance of LLMs in 2023. Before that they were commonly used: In the late 1980s and mid-1990s, the statistical approach ended a period of AI winter , which was caused by the inefficiencies of the rule-based approaches. The earliest decision trees , producing systems of hard if–then rules , were still very similar to the old rule-based approaches. Only the introduction of hidden Markov models , applied to part-of-speech tagging, announced
828-472: The age of symbolic NLP , the area of computational linguistics maintained strong ties with cognitive studies. As an example, George Lakoff offers a methodology to build natural language processing (NLP) algorithms through the perspective of cognitive science, along with the findings of cognitive linguistics, with two defining aspects: Ties with cognitive linguistics are part of the historical heritage of NLP, but they have been less frequently addressed since
864-480: The developmental trajectories of NLP (see trends among CoNLL shared tasks above). Cognition refers to "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses." Cognitive science is the interdisciplinary, scientific study of the mind and its processes. Cognitive linguistics is an interdisciplinary branch of linguistics, combining knowledge and research from both psychology and linguistics. Especially during
900-533: The end of the old rule-based approach. A major drawback of statistical methods is that they require elaborate feature engineering . Since 2015, the statistical approach has been replaced by the neural networks approach, using semantic networks and word embeddings to capture semantic properties of words. Intermediate tasks (e.g., part-of-speech tagging and dependency parsing) are not needed anymore. Neural machine translation , based on then-newly-invented sequence-to-sequence transformations, made obsolete
936-530: The hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming . Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: Although rule-based systems for manipulating symbols were still in use in 2020, they have become mostly obsolete with
972-501: The intermediate steps, such as word alignment, previously necessary for statistical machine translation . The following is a list of some of the most commonly researched tasks in natural language processing. Some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks. Though natural language processing tasks are closely intertwined, they can be subdivided into categories for convenience. A coarse division
1008-404: The late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This was due to both the steady increase in computational power (see Moore's law ) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar ), whose theoretical underpinnings discouraged
1044-428: The limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence . npmi ( x ; y ) = pmi ( x ; y ) h ( x , y ) {\displaystyle \operatorname {npmi} (x;y)={\frac {\operatorname {pmi} (x;y)}{h(x,y)}}} Where h ( x , y ) {\displaystyle h(x,y)}
1080-531: The logarithm are intended to correct the bias of PMI towards low-frequency events, by boosting the scores of frequent pairs. A 2011 case study demonstrated the success of PMI in correcting this bias on a corpus drawn from English Misplaced Pages. Taking x to be the word "football", its most strongly associated words y according to the PMI measure (i.e. those maximizing p m i ( x ; y ) {\displaystyle pmi(x;y)} ) were domain-specific ("midfielder", "cornerbacks", "goalkeepers") whereas
1116-453: The same term [REDACTED] This disambiguation page lists articles associated with the title PMI . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=PMI&oldid=1254723794 " Category : Disambiguation pages Hidden categories: Articles containing Indonesian-language text Short description
PMI - Misplaced Pages Continue
1152-442: The sort of corpus linguistics that underlies the machine-learning approach to language processing. In 2003, word n-gram model , at the time the best statistical algorithm, was outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words trained on up to 14 million of words with a CPU cluster in language modelling ) by Yoshua Bengio with co-authors. In 2010, Tomáš Mikolov (then
1188-442: The statistical turn during the 1990s. Nevertheless, approaches to develop cognitive models towards technically operationalizable frameworks have been pursued in the context of various frameworks, e.g., of cognitive grammar, functional grammar, construction grammar, computational psycholinguistics and cognitive neuroscience (e.g., ACT-R ), however, with limited uptake in mainstream NLP (as measured by presence on major conferences of
1224-525: The terms ranked most highly by PMI were much more general ("league", "clubs", "england"). Total correlation is an extension of mutual information to multi-variables. Analogously to the definition of total correlation, the extension of PMI to multi-variables is "specific correlation." The SI of the results of random variables x = ( x 1 , x 2 , … , x n ) {\displaystyle {\boldsymbol {x}}=(x_{1},x_{2},\ldots {},x_{n})}
1260-411: The two words co-occur in [a] corpus than we would have expected them to appear by chance." The concept was introduced in 1961 by Robert Fano under the name of "mutual information", but today that term is instead used for a related measure of dependence between random variables: The mutual information (MI) of two discrete random variables refers to the average PMI of all possible events. The PMI of
1296-446: Was due partly to a flurry of results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare , where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. Symbolic approach, i.e.,
#720279