Misplaced Pages

Minimum-variance unbiased estimator

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

#234765

62-496: For practical statistics problems, it is important to determine the MVUE if one exists, since less-than-optimal procedures would naturally be avoided, other things being equal. This has led to substantial development of statistical theory related to the problem of optimal estimation. While combining the constraint of unbiasedness with the desirability metric of least variance leads to good results in most practical settings—making MVUE

124-488: A better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g ( X ), and then evaluate that conditional expected value to get an estimator that is in various senses optimal. The theorem is named after C.R. Rao and David Blackwell . The process of transforming an estimator using the Rao–Blackwell theorem can be referred to as Rao–Blackwellization . The transformed estimator

186-461: A different number (depending on distribution), but this results in a biased estimator. This number is always larger than n  − 1, so this is known as a shrinkage estimator , as it "shrinks" the unbiased estimator towards zero; for the normal distribution the optimal value is n  + 1. Suppose X 1 , ..., X n are independent and identically distributed (i.i.d.) random variables with expectation μ and variance σ . If

248-526: A family of densities p θ , θ ∈ Ω {\displaystyle p_{\theta },\theta \in \Omega } , where Ω {\displaystyle \Omega } is the parameter space. An unbiased estimator δ ( X 1 , X 2 , … , X n ) {\displaystyle \delta (X_{1},X_{2},\ldots ,X_{n})} of g ( θ ) {\displaystyle g(\theta )}

310-412: A lower value of some loss function (particularly mean squared error ) compared with unbiased estimators (notably in shrinkage estimators ); or because in some cases being unbiased is too strong a condition, and the only unbiased estimators are not useful. Bias can also be measured with respect to the median , rather than the mean (expected value), in which case one distinguishes median -unbiased from

372-460: A natural starting point for a broad range of analyses—a targeted specification may perform better for a given problem; thus, MVUE is not always the best stopping point. Consider estimation of g ( θ ) {\displaystyle g(\theta )} based on data X 1 , X 2 , … , X n {\displaystyle X_{1},X_{2},\ldots ,X_{n}} i.i.d. from some member of

434-420: A non-linear function f and a mean-unbiased estimator U of a parameter p , the composite estimator f ( U ) need not be a mean-unbiased estimator of f ( p ). For example, the square root of the unbiased estimator of the population variance is not a mean-unbiased estimator of the population standard deviation : the square root of the unbiased sample variance , the corrected sample standard deviation ,

496-914: A one-dimensional parameter θ will be said to be median-unbiased, if, for fixed θ, the median of the distribution of the estimate is at the value θ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. They are invariant under one-to-one transformations . There are methods of construction median-unbiased estimators for probability distributions that have monotone likelihood-functions , such as one-parameter exponential families, to ensure that they are optimal (in

558-502: A random sample from a scale-uniform distribution X ∼ U ( ( 1 − k ) θ , ( 1 + k ) θ ) , {\displaystyle X\sim U\left((1-k)\theta ,(1+k)\theta \right),} with unknown mean E [ X ] = θ {\displaystyle E[X]=\theta } and known design parameter k ∈ ( 0 , 1 ) {\displaystyle k\in (0,1)} . In

620-423: A sense analogous to minimum-variance property considered for mean-unbiased estimators). One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss-functions. Any minimum-variance mean -unbiased estimator minimizes

682-427: A sufficient statistic for λ, i.e., the conditional distribution of the data X 1 , ..., X n , depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator After doing some algebra we have Since the average number of calls arriving during the first n minutes is n λ, one might not be surprised if this estimator has a fairly high probability (if n is big) of being close to So δ 1

SECTION 10

#1732801840235

744-440: A transformation is applied to a mean-unbiased estimator, the result need not be a mean-unbiased estimator of its corresponding population statistic. By Jensen's inequality , a convex function as transformation will introduce positive bias, while a concave function will introduce negative bias, and a function of mixed convexity may introduce bias in either direction, depending on the specific function and distribution. That is, for

806-487: Is the MVUE minimizes MSE among unbiased estimators . In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator; see estimator bias . Consider the data to be a single observation from an absolutely continuous distribution on R {\displaystyle \mathbb {R} } with density and we wish to find the UMVU estimator of First we recognize that

868-410: Is UMVUE if ∀ θ ∈ Ω {\displaystyle \forall \theta \in \Omega } , for any other unbiased estimator δ ~ . {\displaystyle {\tilde {\delta }}.} If an unbiased estimator of g ( θ ) {\displaystyle g(\theta )} exists, then one can prove there

930-423: Is a complete sufficient statistic for the family of densities. Then is the MVUE for g ( θ ) . {\displaystyle g(\theta ).} A Bayesian analog is a Bayes estimator , particularly with minimum mean square error (MMSE). An efficient estimator need not exist, but if it does and if it is unbiased, it is the MVUE. Since the mean squared error (MSE) of an estimator δ

992-410: Is a distinct concept from consistency : consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased (see bias versus consistency for more). All else being equal, an unbiased estimator is preferable to a biased estimator, although in practice, biased estimators (with generally small bias) are frequently used. When a biased estimator is used, bounds of

1054-432: Is a fixed, unknown constant that is part of this distribution), and then we construct some estimator θ ^ {\displaystyle {\hat {\theta }}} that maps observed data to values that we hope are close to θ . The bias of θ ^ {\displaystyle {\hat {\theta }}} relative to θ {\displaystyle \theta }

1116-470: Is a function of a complete, sufficient statistic is the UMVUE estimator. Put formally, suppose δ ( X 1 , X 2 , … , X n ) {\displaystyle \delta (X_{1},X_{2},\ldots ,X_{n})} is unbiased for g ( θ ) {\displaystyle g(\theta )} , and that T {\displaystyle T}

1178-410: Is a result that characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria. The Rao–Blackwell theorem states that if g ( X ) is any kind of estimator of a parameter θ, then the conditional expectation of g ( X ) given T ( X ), where T is a sufficient statistic , is typically

1240-562: Is an essentially unique MVUE. Using the Rao–Blackwell theorem one can also prove that determining the MVUE is simply a matter of finding a complete sufficient statistic for the family p θ , θ ∈ Ω {\displaystyle p_{\theta },\theta \in \Omega } and conditioning any unbiased estimator on it. Further, by the Lehmann–Scheffé theorem , an unbiased estimator that

1302-789: Is an orthogonal decomposition, Pythagorean theorem says | C → | 2 = | A → | 2 + | B → | 2 {\displaystyle |{\vec {C}}|^{2}=|{\vec {A}}|^{2}+|{\vec {B}}|^{2}} , and taking expectations we get n σ 2 = n E ⁡ [ ( X ¯ − μ ) 2 ] + n E ⁡ [ S 2 ] {\displaystyle n\sigma ^{2}=n\operatorname {E} \left[({\overline {X}}-\mu )^{2}\right]+n\operatorname {E} [S^{2}]} , as above (but times n {\displaystyle n} ). If

SECTION 20

#1732801840235

1364-452: Is an unbiased estimator for parameter θ , it is not guaranteed that g( θ ^ {\displaystyle {\hat {\theta }}} ) is an unbiased estimator for g( θ). In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using the mean signed difference . The sample variance of a random variable demonstrates two aspects of estimator bias: firstly,

1426-411: Is an unbiased estimator of the population variance, σ . The ratio between the biased (uncorrected) and unbiased estimates of the variance is known as Bessel's correction . The reason that an uncorrected sample variance, S , is biased stems from the fact that the sample mean is an ordinary least squares (OLS) estimator for μ : X ¯ {\displaystyle {\overline {X}}}

1488-770: Is an unbiased estimator of the population variance. Algebraically speaking, E ⁡ [ S 2 ] {\displaystyle \operatorname {E} [S^{2}]} is unbiased because: where the transition to the second line uses the result derived above for the biased estimator. Thus E ⁡ [ S 2 ] = σ 2 {\displaystyle \operatorname {E} [S^{2}]=\sigma ^{2}} , and therefore S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ¯ ) 2 {\displaystyle S^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2}}

1550-416: Is biased. The bias depends both on the sampling distribution of the estimator and on the transform, and can be quite involved to calculate – see unbiased estimation of standard deviation for a discussion in this case. While bias quantifies the average difference to be expected between an estimator and an underlying parameter, an estimator based on a finite sample can additionally be expected to differ from

1612-451: Is called the Rao–Blackwell estimator . One case of Rao–Blackwell theorem states: In other words, The essential tools of the proof besides the definition above are the law of total expectation and the fact that for any random variable Y , E( Y ) cannot be less than [E( Y )] . That inequality is a case of Jensen's inequality , although it may also be shown to follow instantly from the frequently mentioned fact that More precisely,

1674-431: Is clearly a very much improved estimator of that last quantity. In fact, since S n is complete and δ 0 is unbiased, δ 1 is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem . Rao–Blackwellization is an idempotent operation. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator. If

1736-422: Is defined as where E x ∣ θ {\displaystyle \operatorname {E} _{x\mid \theta }} denotes expected value over the distribution P ( x ∣ θ ) {\displaystyle P(x\mid \theta )} (i.e., averaging over all possible observations x {\displaystyle x} ). The second equation follows since θ

1798-466: Is in fact true in general, as explained above. A far more extreme case of a biased estimator being better than any unbiased estimator arises from the Poisson distribution . Suppose that X has a Poisson distribution with expectation  λ . Suppose it is desired to estimate with a sample of size 1. (For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and λ

1860-405: Is its value always positive but it is also more accurate in the sense that its mean squared error is smaller; compare the unbiased estimator's MSE of The MSEs are functions of the true value  λ . The bias of the maximum-likelihood estimator is: The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 to n are placed in a box and one

1922-499: Is measurable with respect to the conditional distribution P ( x ∣ θ ) {\displaystyle P(x\mid \theta )} . An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ , or equivalently, if the expected value of the estimator matches that of the parameter. Unbiasedness is not guaranteed to carry over. For example, if θ ^ {\displaystyle {\hat {\theta }}}

Minimum-variance unbiased estimator - Misplaced Pages Continue

1984-569: Is not a function of T = ( X ( 1 ) , X ( n ) ) {\displaystyle T=\left(X_{(1)},X_{(n)}\right)} , the minimal sufficient statistic for θ {\displaystyle \theta } (where X ( 1 ) = min ( X i ) {\displaystyle X_{(1)}=\min(X_{i})} and X ( n ) = max ( X i ) {\displaystyle X_{(n)}=\max(X_{i})} ), it may be improved using

2046-431: Is selected at random, giving a value X . If n is unknown, then the maximum-likelihood estimator of n is X , even though the expectation of X given n is only ( n  + 1)/2; we can be certain only that n is at least X and is probably more. In this case, the natural unbiased estimator is 2 X  − 1. The theory of median -unbiased estimators was revived by George W. Brown in 1947: An estimate of

2108-518: Is sought for the population variance as above, but this time to minimise the MSE: If the variables X 1 ... X n follow a normal distribution, then nS /σ has a chi-squared distribution with n  − 1 degrees of freedom, giving: and so With a little algebra it can be confirmed that it is c = 1/( n  + 1) which minimises this combined loss function, rather than c = 1/( n  − 1) which minimises just

2170-490: Is the average number of calls per minute, then e is the probability that no calls arrive in the next two minutes.) Since the expectation of an unbiased estimator δ ( X ) is equal to the estimand , i.e. the only function of the data constituting an unbiased estimator is To see this, note that when decomposing e from the above expression for expectation, the sum that is left is a Taylor series expansion of e as well, yielding e e  = e (see Characterizations of

2232-566: Is the number that makes the sum ∑ i = 1 n ( X i − X ¯ ) 2 {\displaystyle \sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}} as small as possible. That is, when any other number is plugged into this sum, the sum can only increase. In particular, the choice μ ≠ X ¯ {\displaystyle \mu \neq {\overline {X}}} gives, and then The above discussion can be understood in geometric terms:

2294-399: Is the trace (diagonal sum) of the covariance matrix of the estimator and ‖ Bias ⁡ ( θ ^ , θ ) ‖ 2 {\displaystyle \left\Vert \operatorname {Bias} ({\hat {\theta }},\theta )\right\Vert ^{2}} is the square vector norm . For example, suppose an estimator of the form

2356-696: The n − 1 {\displaystyle n-1} directions perpendicular to u → {\displaystyle {\vec {u}}} , so that E ⁡ [ ( X ¯ − μ ) 2 ] = σ 2 n {\displaystyle \operatorname {E} \left[({\overline {X}}-\mu )^{2}\right]={\frac {\sigma ^{2}}{n}}} and E ⁡ [ S 2 ] = ( n − 1 ) σ 2 n {\displaystyle \operatorname {E} [S^{2}]={\frac {(n-1)\sigma ^{2}}{n}}} . This

2418-575: The risk ( expected loss ) with respect to the squared-error loss function (among mean-unbiased estimators), as observed by Gauss . A minimum- average absolute deviation median -unbiased estimator minimizes the risk with respect to the absolute loss function (among median-unbiased estimators), as observed by Laplace . Other loss functions are used in statistics, particularly in robust statistics . For univariate parameters, median-unbiased estimators remain median-unbiased under transformations that preserve order (or reverse order). Note that, when

2480-771: The sample mean and uncorrected sample variance are defined as then S is a biased estimator of σ , because To continue, we note that by subtracting μ {\displaystyle \mu } from both sides of X ¯ = 1 n ∑ i = 1 n X i {\displaystyle {\overline {X}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i}} , we get Meaning, (by cross-multiplication) n ⋅ ( X ¯ − μ ) = ∑ i = 1 n ( X i − μ ) {\displaystyle n\cdot ({\overline {X}}-\mu )=\sum _{i=1}^{n}(X_{i}-\mu )} . Then,

2542-472: The MVUE Clearly δ ( X ) = T 2 2 {\displaystyle \delta (X)={\frac {T^{2}}{2}}} is unbiased and T = log ⁡ ( 1 + e − x ) {\displaystyle T=\log(1+e^{-x})} is complete sufficient, thus the UMVU estimator is This example illustrates that an unbiased function of

Minimum-variance unbiased estimator - Misplaced Pages Continue

2604-416: The Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous. Phone calls arrive at a switchboard according to a Poisson process at an average rate of λ per minute. This rate is not observable, but the numbers X 1 , ..., X n of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate

2666-544: The Rao–Blackwell theorem speaks of the "expected loss" or risk function : where the "loss function" L may be any convex function . If the loss function is twice-differentiable, as in the case for mean-squared-error, then we have the sharper inequality The improved estimator is unbiased if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation . The theorem holds regardless of whether biased or unbiased estimators are used. The theorem seems very weak: it says only that

2728-401: The bias are calculated. A biased estimator may be used for various reasons: because an unbiased estimator does not exist without further assumptions about a population; because an estimator is difficult to compute (as in unbiased estimation of standard deviation ); because a biased estimator may be unbiased with respect to different measures of central tendency ; because a biased estimator gives

2790-433: The complete sufficient statistic will be UMVU, as Lehmann–Scheffé theorem states. Unbiasedness In statistics , the bias of an estimator (or bias function ) is the difference between this estimator 's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased . In statistics, "bias" is an objective property of an estimator. Bias

2852-568: The conditioning statistic is both complete and sufficient , and the starting estimator is unbiased, then the Rao–;Blackwell estimator is the unique " best unbiased estimator ": see Lehmann–Scheffé theorem . An example of an improvable Rao–Blackwell improvement, when using a minimal sufficient statistic that is not complete , was provided by Galili and Meilijson in 2016. Let X 1 , … , X n {\displaystyle X_{1},\ldots ,X_{n}} be

2914-473: The density can be written as Which is an exponential family with sufficient statistic T = log ⁡ ( 1 + e − x ) {\displaystyle T=\log(1+e^{-x})} . In fact this is a full rank exponential family, and therefore T {\displaystyle T} is complete sufficient. See exponential family for a derivation which shows Therefore, Here we use Lehmann–Scheffé theorem to get

2976-499: The distribution of C → {\displaystyle {\vec {C}}} is rotationally symmetric, as in the case when X i {\displaystyle X_{i}} are sampled from a Gaussian, then on average, the dimension along u → {\displaystyle {\vec {u}}} contributes to | C → | 2 {\displaystyle |{\vec {C}}|^{2}} equally as

3038-600: The expected value of the uncorrected sample variance does not equal the population variance σ , unless multiplied by a normalization factor. The sample mean, on the other hand, is an unbiased estimator of the population mean  μ . Note that the usual definition of sample variance is S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ¯ ) 2 {\displaystyle S^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2}} , and this

3100-447: The exponential function ). If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is very likely to be near 0, which is the opposite extreme. And, if X is observed to be 101, then the estimate is even more absurd: It is −1, although the quantity being estimated must be positive. The (biased) maximum likelihood estimator is far better than this unbiased estimator. Not only

3162-477: The formal sampling-theory sense above) of their estimates. For example, Gelman and coauthors (1995) write: "From a Bayesian perspective, the principle of unbiasedness is reasonable in the limit of large samples, but otherwise it is potentially misleading." Rao%E2%80%93Blackwell theorem In statistics , the Rao–Blackwell theorem , sometimes referred to as the Rao–Blackwell–Kolmogorov theorem ,

SECTION 50

#1732801840235

3224-478: The mean square error of the Rao-Blackwell estimator has the following decomposition Since E ⁡ [ Var ⁡ ( δ ( X ) ∣ T ( X ) ) ] ≥ 0 {\displaystyle \operatorname {E} [\operatorname {Var} (\delta (X)\mid T(X))]\geq 0} , the Rao-Blackwell theorem immediately follows. The more general version of

3286-505: The naive estimator is biased, which can be corrected by a scale factor; second, the unbiased estimator is not optimal in terms of mean squared error (MSE), which can be minimized by using a different scale factor, resulting in a biased estimator with lower MSE than the unbiased estimator. Concretely, the naive estimator sums the squared deviations and divides by n, which is biased. Dividing instead by n  − 1 yields an unbiased estimator. Conversely, MSE can be minimized by dividing by

3348-590: The parameter due to the randomness in the sample. An estimator that minimises the bias will not necessarily minimise the mean square error. One measure which is used to try to reflect both types of difference is the mean square error , This can be shown to be equal to the square of the bias, plus the variance: When the parameter is a vector, an analogous decomposition applies: where trace ⁡ ( Cov ⁡ ( θ ^ ) ) {\displaystyle \operatorname {trace} (\operatorname {Cov} ({\hat {\theta }}))}

3410-423: The part along u → {\displaystyle {\vec {u}}} and B → = ( X 1 − X ¯ , … , X n − X ¯ ) {\displaystyle {\vec {B}}=(X_{1}-{\overline {X}},\ldots ,X_{n}-{\overline {X}})} for the complementary part. Since this

3472-576: The previous becomes: This can be seen by noting the following formula, which follows from the Bienaymé formula , for the term in the inequality for the expectation of the uncorrected sample variance above: E ⁡ [ ( X ¯ − μ ) 2 ] = 1 n σ 2 {\displaystyle \operatorname {E} {\big [}({\overline {X}}-\mu )^{2}{\big ]}={\frac {1}{n}}\sigma ^{2}} . In other words,

3534-418: The probability e that the next one-minute period passes with no phone calls. An extremely crude estimator of the desired probability is i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator. The sum can be readily shown to be

3596-766: The reciprocal of the parameter of a binomial random variable. Suppose we have a statistical model , parameterized by a real number θ , giving rise to a probability distribution for observed data, P θ ( x ) = P ( x ∣ θ ) {\displaystyle P_{\theta }(x)=P(x\mid \theta )} , and a statistic θ ^ {\displaystyle {\hat {\theta }}} which serves as an estimator of θ based on any observed data x {\displaystyle x} . That is, we assume that our data follows some unknown distribution P ( x ∣ θ ) {\displaystyle P(x\mid \theta )} (where θ

3658-403: The search for "best" possible unbiased estimators for θ , {\displaystyle \theta ,} it is natural to consider X 1 {\displaystyle X_{1}} as an initial (crude) unbiased estimator for θ {\displaystyle \theta } and then try to improve it. Since X 1 {\displaystyle X_{1}}

3720-525: The square of the bias. More generally it is only in restricted classes of problems that there will be an estimator that minimises the MSE independently of the parameter values. However it is very common that there may be perceived to be a bias–variance tradeoff , such that a small increase in bias can be traded for a larger decrease in variance, resulting in a more desirable estimator overall. Most bayesians are rather unconcerned about unbiasedness (at least in

3782-413: The usual mean -unbiasedness property. Mean-unbiasedness is not preserved under non-linear transformations , though median-unbiasedness is (see § Effect of transformations ); for example, the sample variance is a biased estimator for the population variance. These are all illustrated below. An unbiased estimator for a parameter need not always exist. For example, there is no unbiased estimator for

SECTION 60

#1732801840235

3844-828: The vector C → = ( X 1 − μ , … , X n − μ ) {\displaystyle {\vec {C}}=(X_{1}-\mu ,\ldots ,X_{n}-\mu )} can be decomposed into the "mean part" and "variance part" by projecting to the direction of u → = ( 1 , … , 1 ) {\displaystyle {\vec {u}}=(1,\ldots ,1)} and to that direction's orthogonal complement hyperplane. One gets A → = ( X ¯ − μ , … , X ¯ − μ ) {\displaystyle {\vec {A}}=({\overline {X}}-\mu ,\ldots ,{\overline {X}}-\mu )} for

#234765