In econometrics and statistics , the generalized method of moments ( GMM ) is a generic method for estimating parameters in statistical models . Usually it is applied in the context of semiparametric models , where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.
34-610: GMM may refer to: Generalized method of moments , an econometric method GMM Grammy , a Thai entertainment company Gaussian mixture model , a statistical probabilistic model Google Map Maker , a public cartography project GMM, IATA code for Gamboma Airport in the Republic of the Congo Good Mythical Morning , an online morning talk show hosted by YouTubers, Rhett and Link Global Marijuana March ,
68-498: A certain number of moment conditions be specified for the model. These moment conditions are functions of the model parameters and the data, such that their expectation is zero at the parameters' true values. The GMM method then minimizes a certain norm of the sample averages of the moment conditions, and can therefore be thought of as a special case of minimum-distance estimation . The GMM estimators are known to be consistent , asymptotically normal , and most efficient in
102-413: A generic observation. Moreover, the function m ( θ ) must differ from zero for θ ≠ θ 0 , otherwise the parameter θ will not be point- identified . The basic idea behind GMM is to replace the theoretical expected value E[⋅] with its empirical analog—sample average: and then to minimize the norm of this expression with respect to θ . The minimizing value of θ is our estimate for θ 0 . By
136-584: A number θ ^ {\displaystyle \scriptstyle {\hat {\theta }}} which would make m ^ ( θ ^ ) {\displaystyle \scriptstyle {\hat {m}}(\;\!{\hat {\theta }}\;\!)} as close to zero as possible. Mathematically, this is equivalent to minimizing a certain norm of m ^ ( θ ) {\displaystyle \scriptstyle {\hat {m}}(\theta )} (norm of m , denoted as || m ||, measures
170-418: A regime of a process that is not ergodic is said to be in non-ergodic regime. A regime implies a time-window of a process whereby ergodicity measure is applied. One can discuss the ergodicity of various statistics of a stochastic process. For example, a wide-sense stationary process X ( t ) {\displaystyle X(t)} has constant mean and autocovariance that depends only on
204-481: A worldwide demonstration associated with cannabis culture Graspop Metal Meeting , a Belgian heavy metal festival held annually in Dessel Gimar Montaz Mautino , a french manufacturer of ski lifts Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title GMM . If an internal link led you here, you may wish to change
238-631: Is a statistical property of an estimator stating that, having a sufficient number of observations, the estimator will converge in probability to the true value of parameter: Sufficient conditions for a GMM estimator to be consistent are as follows: The second condition here (so-called Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect non-identification problem: In practice applied econometricians often simply assume that global identification holds, without actually proving it. Asymptotic normality
272-1157: Is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can make a statement about the asymptotic distribution of the GMM estimator, we need to define two auxiliary matrices: Then under conditions 1–6 listed below, the GMM estimator will be asymptotically normal with limiting distribution : T ( θ ^ − θ 0 ) → d N [ 0 , ( G T W G ) − 1 G T W Ω W T G ( G T W T G ) − 1 ] . {\displaystyle {\sqrt {T}}{\big (}{\hat {\theta }}-\theta _{0}{\big )}\ {\xrightarrow {d}}\ {\mathcal {N}}{\big [}0,(G^{\mathsf {T}}WG)^{-1}G^{\mathsf {T}}W\Omega W^{\mathsf {T}}G(G^{\mathsf {T}}W^{\mathsf {T}}G)^{-1}{\big ]}.} Conditions: So far we have said nothing about
306-481: Is computed based on the available data set, which will be denoted as W ^ {\displaystyle \scriptstyle {\hat {W}}} . Thus, the GMM estimator can be written as Under suitable conditions this estimator is consistent , asymptotically normal , and with right choice of weighting matrix W ^ {\displaystyle \scriptstyle {\hat {W}}} also asymptotically efficient . Consistency
340-440: Is ergodic in mean if converges in squared mean to the ensemble average E [ X ] {\displaystyle E[X]} , as N → ∞ {\displaystyle N\rightarrow \infty } . Ergodicity means the ensemble average equals the time average. Following are examples to illustrate this principle. Each operator in a call centre spends time alternately speaking and listening on
374-402: Is greater than the dimension of the parameter vector θ , the model is said to be over-identified . Sargan (1958) proposed tests for over-identifying restrictions based on instrumental variables estimators that are distributed in large samples as Chi-square variables with degrees of freedom that depend on the number of over-identifying restrictions. Subsequently, Hansen (1982) applied this test to
SECTION 10
#1732800932542408-410: Is said to be mean-ergodic or mean-square ergodic in the first moment if the time average estimate converges in squared mean to the ensemble average μ X {\displaystyle \mu _{X}} as T → ∞ {\displaystyle T\rightarrow \infty } . Likewise, the process is said to be autocovariance-ergodic or d moment if
442-671: Is sufficiently close to zero to suggest that the model fits the data well. The GMM method has then replaced the problem of solving the equation m ^ ( θ ) = 0 {\displaystyle {\hat {m}}(\theta )=0} , which chooses θ {\displaystyle \theta } to match the restrictions exactly, by a minimization calculation. The minimization can always be conducted even when no θ 0 {\displaystyle \theta _{0}} exists such that m ( θ 0 ) = 0 {\displaystyle m(\theta _{0})=0} . This
476-399: Is that the data Y t be generated by a weakly stationary ergodic stochastic process . (The case of independent and identically distributed (iid) variables Y t is a special case of this condition.) In order to apply GMM, we need to have "moment conditions", that is, we need to know a vector-valued function g ( Y , θ ) such that where E denotes expectation , and Y t is
510-482: Is the GMM estimator of the parameter θ 0 {\displaystyle \theta _{0}} , k is the number of moment conditions (dimension of vector g ), and l is the number of estimated parameters (dimension of vector θ ). Matrix W ^ T {\displaystyle {\hat {W}}_{T}} must converge in probability to Ω − 1 {\displaystyle \Omega ^{-1}} ,
544-439: Is what J-test does. The J-test is also called a test for over-identifying restrictions . Formally we consider two hypotheses : Under hypothesis H 0 {\displaystyle H_{0}} , the following so-called J-statistic is asymptotically chi-squared distributed with k–l degrees of freedom. Define J to be: where θ ^ {\displaystyle {\hat {\theta }}}
578-485: The Cramér–Rao bound . One difficulty with implementing the outlined method is that we cannot take W = Ω because, by the definition of matrix Ω, we need to know the value of θ 0 in order to compute this matrix, and θ 0 is precisely the quantity we do not know and are trying to estimate in the first place. In the case of Y t being iid we can estimate W as Several approaches exist to deal with this issue,
612-696: The law of large numbers , m ^ ( θ ) ≈ E [ g ( Y t , θ ) ] = m ( θ ) {\displaystyle \scriptstyle {\hat {m}}(\theta )\,\approx \;\operatorname {E} [g(Y_{t},\theta )]\,=\,m(\theta )} for large values of T , and thus we expect that m ^ ( θ 0 ) ≈ m ( θ 0 ) = 0 {\displaystyle \scriptstyle {\hat {m}}(\theta _{0})\;\approx \;m(\theta _{0})\;=\;0} . The generalized method of moments looks for
646-456: The BL-MoM in specific applications. Ergodic process In physics , statistics , econometrics and signal processing , a stochastic process is said to be in an ergodic regime if an observable's ensemble average equals the time average. In this regime, any collection of random samples from a process must represent the average statistical properties of the entire regime. Conversely,
680-612: The alternative hypothesis H 1 {\displaystyle H_{1}} , the J-statistic is asymptotically unbounded: To conduct the test we compute the value of J from the data. It is a nonnegative number. We compare it with (for example) the 0.95 quantile of the χ k − ℓ 2 {\displaystyle \chi _{k-\ell }^{2}} distribution: Many other popular estimation techniques can be cast in terms of GMM optimization: In method of moments , an alternative to
714-443: The available data consists of T observations { Y t } t = 1,..., T , where each observation Y t is an n -dimensional multivariate random variable . We assume that the data come from a certain statistical model , defined up to an unknown parameter θ ∈ Θ . The goal of the estimation problem is to find the “true” value of this parameter, θ 0 , or at least a reasonably close estimate. A general assumption of GMM
SECTION 20
#1732800932542748-428: The choice of matrix W , except that it must be positive semi-definite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in the asymptotic variance of that estimator. It can be shown that taking will result in the most efficient estimator in the class of all (generalized) method of moment estimators. Only infinite number of orthogonal conditions obtains
782-456: The class of all estimators that do not use any extra information aside from that contained in the moment conditions. GMM were advocated by Lars Peter Hansen in 1982 as a generalization of the method of moments , introduced by Karl Pearson in 1894. However, these estimators are mathematically equivalent to those based on "orthogonality conditions" (Sargan, 1958, 1959) or "unbiased estimating equations" (Huber, 1967; Wang et al., 1997). Suppose
816-414: The distance between m and zero). The properties of the resulting estimator will depend on the particular choice of the norm function, and therefore the theory of GMM considers an entire family of norms, defined as where W is a positive-definite weighting matrix, and m T {\displaystyle m^{\mathsf {T}}} denotes transposition . In practice, the weighting matrix W
850-466: The efficient weighting matrix (note that previously we only required that W be proportional to Ω − 1 {\displaystyle \Omega ^{-1}} for estimator to be efficient; however in order to conduct the J-test W must be exactly equal to Ω − 1 {\displaystyle \Omega ^{-1}} , not simply proportional). Under
884-424: The first one being the most popular: Another important issue in implementation of minimization procedure is that the function is supposed to search through (possibly high-dimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization . When the number of moment conditions
918-527: The lag τ {\displaystyle \tau } and not on time t {\displaystyle t} . The properties μ X {\displaystyle \mu _{X}} and r X ( τ ) {\displaystyle r_{X}(\tau )} are ensemble averages (calculated over all possible sample functions X {\displaystyle X} ), not time averages . The process X ( t ) {\displaystyle X(t)}
952-399: The link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=GMM&oldid=1253180145 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Generalized method of moments The method requires that
986-499: The mathematically equivalent formulation of GMM estimators. Note, however, that such statistics can be negative in empirical applications where the models are misspecified, and likelihood ratio tests can yield insights since the models are estimated under both null and alternative hypotheses (Bhargava and Sargan, 1983). Conceptually we can check whether m ^ ( θ ^ ) {\displaystyle {\hat {m}}({\hat {\theta }})}
1020-403: The original (non-generalized) Method of Moments (MoM) is described, and references to some applications and a list of theoretical advantages and disadvantages relative to the traditional method are provided. This Bayesian-Like MoM (BL-MoM) is distinct from all the related methods described above, which are subsumed by the GMM. The literature does not contain a direct comparison between the GMM and
1054-421: The smallest variance, the Cramér–Rao bound . In this case the formula for the asymptotic distribution of the GMM estimator simplifies to The proof that such a choice of weighting matrix is indeed locally optimal is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, a weighting matrix inches closer to optimality when it turns into an expression closer to
GMM - Misplaced Pages Continue
1088-419: The telephone, as well as taking breaks between calls. Each break and each call are of different length, as are the durations of each 'burst' of speaking and listening, and indeed so is the rapidity of speech at any given moment, which could each be modelled as a random process. Each resistor has an associated thermal noise that depends on the temperature. Take N resistors ( N should be very large) and plot
1122-658: The time average estimate converges in squared mean to the ensemble average r X ( τ ) {\displaystyle r_{X}(\tau )} , as T → ∞ {\displaystyle T\rightarrow \infty } . A process which is ergodic in the mean and autocovariance is sometimes called ergodic in the wide sense . The notion of ergodicity also applies to discrete-time random processes X [ n ] {\displaystyle X[n]} for integer n {\displaystyle n} . A discrete-time random process X [ n ] {\displaystyle X[n]}
1156-462: The voltage across those resistors for a long period. For each resistor you will have a waveform. Calculate the average value of that waveform; this gives you the time average. There are N waveforms as there are N resistors. These N plots are known as an ensemble. Now take a particular instant of time in all those plots and find the average value of the voltage. That gives you the ensemble average for each plot. If ensemble average and time average are
#541458