In statistics , identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions .
53-410: A model that fails to be identifiable is said to be non-identifiable or unidentifiable : two or more parametrizations are observationally equivalent . In some cases, even though a model is non-identifiable, it is still possible to learn the true values of a certain subset of the model parameters. In this case we say that the model is partially identifiable . In other cases it may be possible to learn
106-412: A mean or a standard deviation . If a population exactly follows a known and defined distribution, for example the normal distribution , then a small set of parameters can be measured which provide a comprehensive description of the population, and can be considered to define a probability distribution for the purposes of extracting samples from this population. A "parameter" is to a population as
159-603: A normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable . The general form of its probability density function is f ( x ) = 1 2 π σ 2 e − ( x − μ ) 2 2 σ 2 . {\displaystyle f(x)={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}}\,.} The parameter μ {\textstyle \mu }
212-528: A statistical model with parameter space Θ {\displaystyle \Theta } . We say that P {\displaystyle {\mathcal {P}}} is identifiable if the mapping θ ↦ P θ {\displaystyle \theta \mapsto P_{\theta }} is one-to-one : This definition means that distinct values of θ should correspond to distinct probability distributions: if θ 1 ≠ θ 2 , then also P θ 1 ≠ P θ 2 . If
265-432: A " statistic " is to a sample ; that is to say, a parameter describes the true value calculated from the full population (such as the population mean ), whereas a statistic is an estimated measurement of the parameter based on a sample (such as the sample mean ). Thus a "statistical parameter" can be more specifically referred to as a population parameter . Suppose that we have an indexed family of distributions. If
318-405: A fixed collection of independent normal deviates is a normal deviate. Many results and methods, such as propagation of uncertainty and least squares parameter fitting, can be derived analytically in explicit form when the relevant variables are normally distributed. A normal distribution is sometimes informally called a bell curve . However, many other distributions are bell-shaped (such as
371-758: A generic normal distribution with density f {\textstyle f} , mean μ {\textstyle \mu } and variance σ 2 {\textstyle \sigma ^{2}} , the cumulative distribution function is F ( x ) = Φ ( x − μ σ ) = 1 2 [ 1 + erf ( x − μ σ 2 ) ] . {\displaystyle F(x)=\Phi \left({\frac {x-\mu }{\sigma }}\right)={\frac {1}{2}}\left[1+\operatorname {erf} \left({\frac {x-\mu }{\sigma {\sqrt {2}}}}\right)\right]\,.} The complement of
424-546: A known approximate solution, x 0 {\textstyle x_{0}} , to the desired Φ ( x ) {\textstyle \Phi (x)} . x 0 {\textstyle x_{0}} may be a value from a distribution table, or an intelligent estimate followed by a computation of Φ ( x 0 ) {\textstyle \Phi (x_{0})} using any desired means to compute. Use this value of x 0 {\textstyle x_{0}} and
477-542: A variance of 1 2 {\displaystyle {\frac {1}{2}}} , and Stephen Stigler once defined the standard normal as φ ( z ) = e − π z 2 , {\displaystyle \varphi (z)=e^{-\pi z^{2}},} which has a simple functional form and a variance of σ 2 = 1 2 π . {\textstyle \sigma ^{2}={\frac {1}{2\pi }}.} Every normal distribution
530-424: Is a normal deviate with parameters μ {\textstyle \mu } and σ 2 {\textstyle \sigma ^{2}} , then this X {\textstyle X} distribution can be re-scaled and shifted via the formula Z = ( X − μ ) / σ {\textstyle Z=(X-\mu )/\sigma } to convert it to
583-730: Is a version of the standard normal distribution, whose domain has been stretched by a factor σ {\textstyle \sigma } (the standard deviation) and then translated by μ {\textstyle \mu } (the mean value): f ( x ∣ μ , σ 2 ) = 1 σ φ ( x − μ σ ) . {\displaystyle f(x\mid \mu ,\sigma ^{2})={\frac {1}{\sigma }}\varphi \left({\frac {x-\mu }{\sigma }}\right)\,.} The probability density must be scaled by 1 / σ {\textstyle 1/\sigma } so that
SECTION 10
#1732773033678636-778: Is advantageous because of a much simpler and easier-to-remember formula, and simple approximate formulas for the quantiles of the distribution. Normal distributions form an exponential family with natural parameters θ 1 = μ σ 2 {\textstyle \textstyle \theta _{1}={\frac {\mu }{\sigma ^{2}}}} and θ 2 = − 1 2 σ 2 {\textstyle \textstyle \theta _{2}={\frac {-1}{2\sigma ^{2}}}} , and natural statistics x and x . The dual expectation parameters for normal distribution are η 1 = μ and η 2 = μ + σ . The cumulative distribution function (CDF) of
689-394: Is also used quite often. The normal distribution is often referred to as N ( μ , σ 2 ) {\textstyle N(\mu ,\sigma ^{2})} or N ( μ , σ 2 ) {\textstyle {\mathcal {N}}(\mu ,\sigma ^{2})} . Thus when a random variable X {\textstyle X}
742-417: Is called a normal deviate . Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. Their importance is partly due to the central limit theorem . It states that, under some conditions, the average of many samples (observations) of a random variable with finite mean and variance
795-838: Is described by this probability density function (or density): φ ( z ) = e − z 2 2 2 π . {\displaystyle \varphi (z)={\frac {e^{\frac {-z^{2}}{2}}}{\sqrt {2\pi }}}\,.} The variable z {\textstyle z} has a mean of 0 and a variance and standard deviation of 1. The density φ ( z ) {\textstyle \varphi (z)} has its peak 1 2 π {\textstyle {\frac {1}{\sqrt {2\pi }}}} at z = 0 {\textstyle z=0} and inflection points at z = + 1 {\textstyle z=+1} and z = − 1 {\textstyle z=-1} . Although
848-412: Is equivalent to saying that the standard normal distribution Z {\textstyle Z} can be scaled/stretched by a factor of σ {\textstyle \sigma } and shifted by μ {\textstyle \mu } to yield a different normal distribution, called X {\textstyle X} . Conversely, if X {\textstyle X}
901-413: Is invertible. Thus, this is the identification condition in the model. Suppose P {\displaystyle {\mathcal {P}}} is the classical errors-in-variables linear model : where ( ε , η , x* ) are jointly normal independent random variables with zero expected value and unknown variances, and only the variables ( x , y ) are observed. Then this model is not identifiable, only
954-437: Is itself a random variable—whose distribution converges to a normal distribution as the number of samples increases. Therefore, physical quantities that are expected to be the sum of many independent processes, such as measurement errors , often have distributions that are nearly normal. Moreover, Gaussian distributions have some unique properties that are valuable in analytic studies. For instance, any linear combination of
1007-457: Is normally distributed with mean μ {\textstyle \mu } and standard deviation σ {\textstyle \sigma } , one may write X ∼ N ( μ , σ 2 ) . {\displaystyle X\sim {\mathcal {N}}(\mu ,\sigma ^{2}).} Some authors advocate using the precision τ {\textstyle \tau } as
1060-404: Is the indicator function ). Thus, with an infinite number of observations we will be able to find the true probability distribution P 0 in the model, and since the identifiability condition above requires that the map θ ↦ P θ {\displaystyle \theta \mapsto P_{\theta }} be invertible, we will also be able to find the true value of
1113-402: Is the mean or expectation of the distribution (and also its median and mode ), while the parameter σ 2 {\textstyle \sigma ^{2}} is the variance . The standard deviation of the distribution is σ {\textstyle \sigma } (sigma). A random variable with a Gaussian distribution is said to be normally distributed , and
SECTION 20
#17327730336781166-868: Is very close to zero, and simplifies formulas in some contexts, such as in the Bayesian inference of variables with multivariate normal distribution . Alternatively, the reciprocal of the standard deviation τ ′ = 1 / σ {\textstyle \tau '=1/\sigma } might be defined as the precision , in which case the expression of the normal distribution becomes f ( x ) = τ ′ 2 π e − ( τ ′ ) 2 ( x − μ ) 2 / 2 . {\displaystyle f(x)={\frac {\tau '}{\sqrt {2\pi }}}e^{-(\tau ')^{2}(x-\mu )^{2}/2}.} According to Stigler, this formulation
1219-1910: The e a x 2 {\textstyle e^{ax^{2}}} family of derivatives may be used to easily construct a rapidly converging Taylor series expansion using recursive entries about any point of known value of the distribution, Φ ( x 0 ) {\textstyle \Phi (x_{0})} : Φ ( x ) = ∑ n = 0 ∞ Φ ( n ) ( x 0 ) n ! ( x − x 0 ) n , {\displaystyle \Phi (x)=\sum _{n=0}^{\infty }{\frac {\Phi ^{(n)}(x_{0})}{n!}}(x-x_{0})^{n}\,,} where: Φ ( 0 ) ( x 0 ) = 1 2 π ∫ − ∞ x 0 e − t 2 / 2 d t Φ ( 1 ) ( x 0 ) = 1 2 π e − x 0 2 / 2 Φ ( n ) ( x 0 ) = − ( x 0 Φ ( n − 1 ) ( x 0 ) + ( n − 2 ) Φ ( n − 2 ) ( x 0 ) ) , n ≥ 2 . {\displaystyle {\begin{aligned}\Phi ^{(0)}(x_{0})&={\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{x_{0}}e^{-t^{2}/2}\,dt\\\Phi ^{(1)}(x_{0})&={\frac {1}{\sqrt {2\pi }}}e^{-x_{0}^{2}/2}\\\Phi ^{(n)}(x_{0})&=-\left(x_{0}\Phi ^{(n-1)}(x_{0})+(n-2)\Phi ^{(n-2)}(x_{0})\right),&n\geq 2\,.\end{aligned}}} An application for
1272-861: The Q {\textstyle Q} -function, all of which are simple transformations of Φ {\textstyle \Phi } , are also used occasionally. The graph of the standard normal cumulative distribution function Φ {\textstyle \Phi } has 2-fold rotational symmetry around the point (0,1/2); that is, Φ ( − x ) = 1 − Φ ( x ) {\textstyle \Phi (-x)=1-\Phi (x)} . Its antiderivative (indefinite integral) can be expressed as follows: ∫ Φ ( x ) d x = x Φ ( x ) + φ ( x ) + C . {\displaystyle \int \Phi (x)\,dx=x\Phi (x)+\varphi (x)+C.} The cumulative distribution function of
1325-632: The Cauchy , Student's t , and logistic distributions). (For other names, see Naming .) The univariate probability distribution is generalized for vectors in the multivariate normal distribution and for matrices in the matrix normal distribution . The simplest case of a normal distribution is known as the standard normal distribution or unit normal distribution . This is a special case when μ = 0 {\textstyle \mu =0} and σ 2 = 1 {\textstyle \sigma ^{2}=1} , and it
1378-439: The dependent variables are related to the independent variables. During an election, there may be specific percentages of voters in a country who would vote for each particular candidate – these percentages would be statistical parameters. It is impractical to ask every voter before an election occurs what their candidate preferences are, so a sample of voters will be polled, and a statistic (also called an estimator ) – that is,
1431-850: The double factorial . An asymptotic expansion of the cumulative distribution function for large x can also be derived using integration by parts. For more, see Error function#Asymptotic expansion . A quick approximation to the standard normal distribution's cumulative distribution function can be found by using a Taylor series approximation: Φ ( x ) ≈ 1 2 + 1 2 π ∑ k = 0 n ( − 1 ) k x ( 2 k + 1 ) 2 k k ! ( 2 k + 1 ) . {\displaystyle \Phi (x)\approx {\frac {1}{2}}+{\frac {1}{\sqrt {2\pi }}}\sum _{k=0}^{n}{\frac {(-1)^{k}x^{(2k+1)}}{2^{k}k!(2k+1)}}\,.} The recursive nature of
1484-406: The integral is still 1. If Z {\textstyle Z} is a standard normal deviate , then X = σ Z + μ {\textstyle X=\sigma Z+\mu } will have a normal distribution with expected value μ {\textstyle \mu } and standard deviation σ {\textstyle \sigma } . This
1537-466: The Taylor series expansion above to minimize computations. Repeat the following process until the difference between the computed Φ ( x n ) {\textstyle \Phi (x_{n})} and the desired Φ {\textstyle \Phi } , which we will call Φ ( desired ) {\textstyle \Phi ({\text{desired}})} ,
1590-459: The Taylor series expansion above to minimize the number of computations. Newton's method is ideal to solve this problem because the first derivative of Φ ( x ) {\textstyle \Phi (x)} , which is an integral of the normal standard distribution, is the normal standard distribution, and is readily available to use in the Newton's method solution. To solve, select
1643-401: The above Taylor series expansion is to use Newton's method to reverse the computation. That is, if we have a value for the cumulative distribution function , Φ ( x ) {\textstyle \Phi (x)} , but do not know the x needed to obtain the Φ ( x ) {\textstyle \Phi (x)} , we can use Newton's method to find x, and use
Identifiability - Misplaced Pages Continue
1696-431: The density above is most commonly known as the standard normal, a few authors have used that term to describe other versions of the normal distribution. Carl Friedrich Gauss , for example, once defined the standard normal as φ ( z ) = e − z 2 π , {\displaystyle \varphi (z)={\frac {e^{-z^{2}}}{\sqrt {\pi }}},} which has
1749-411: The distribution is known exactly. The family of chi-squared distributions can be indexed by the number of degrees of freedom : the number of degrees of freedom is a parameter for the distributions, and so the family is thereby parameterized. In statistical inference , parameters are sometimes taken to be unobservable, and in this case the statistician's task is to estimate or infer what they can about
1802-439: The distribution then becomes f ( x ) = τ 2 π e − τ ( x − μ ) 2 / 2 . {\displaystyle f(x)={\sqrt {\frac {\tau }{2\pi }}}e^{-\tau (x-\mu )^{2}/2}.} This choice is claimed to have advantages in numerical computations when σ {\textstyle \sigma }
1855-484: The distributions are defined in terms of the probability density functions (pdfs), then two pdfs should be considered distinct only if they differ on a set of non-zero measure (for example two functions ƒ 1 ( x ) = 1 0 ≤ x < 1 and ƒ 2 ( x ) = 1 0 ≤ x ≤ 1 differ only at a single point x = 1 — a set of measure zero — and thus cannot be considered as distinct pdfs). Identifiability of
1908-420: The following: Where a probability distribution has a domain over a set of objects that are themselves probability distributions, the term concentration parameter is used for quantities that index how variable the outcomes would be. Quantities such as regression coefficients are statistical parameters in the above sense because they index the family of conditional probability distributions that describe how
1961-469: The index is also a parameter of the members of the family, then the family is a parameterized family . Among parameterized families of distributions are the normal distributions , the Poisson distributions , the binomial distributions , and the exponential family of distributions . For example, the family of normal distributions has two parameters, the mean and the variance : if those are specified,
2014-473: The kind of statistical procedure being carried out (for example, the number of degrees of freedom in a Pearson's chi-squared test ). Even if a family of distributions is not specified, quantities such as the mean and variance can generally still be regarded as statistical parameters of the population, and statistical procedures can still attempt to make inferences about such population parameters. Parameters are given names appropriate to their roles, including
2067-542: The location of the true parameter up to a certain finite region of the parameter space, in which case the model is set identifiable . Aside from strictly theoretical exploration of the model properties, identifiability can be referred to in a wider scope when a model is tested with experimental data sets, using identifiability analysis . Let P = { P θ : θ ∈ Θ } {\displaystyle {\mathcal {P}}=\{P_{\theta }:\theta \in \Theta \}} be
2120-474: The model in the sense of invertibility of the map θ ↦ P θ {\displaystyle \theta \mapsto P_{\theta }} is equivalent to being able to learn the model's true parameter if the model can be observed indefinitely long. Indeed, if { X t } ⊆ S is the sequence of observations from the model, then by the strong law of large numbers , for every measurable set A ⊆ S (here 1 {...}
2173-414: The model is identifiable: ƒ θ 1 = ƒ θ 2 ⇔ θ 1 = θ 2 . Let P {\displaystyle {\mathcal {P}}} be the standard linear regression model : (where ′ denotes matrix transpose ). Then the parameter β is identifiable if and only if the matrix E [ x x ′ ] {\displaystyle \mathrm {E} [xx']}
Identifiability - Misplaced Pages Continue
2226-421: The normality assumption and require that x* were not normally distributed, retaining only the independence condition ε ⊥ η ⊥ x* , then the model becomes identifiable. Statistical parameter In statistics , as opposed to its general use in mathematics , a parameter is any quantity of a statistical population that summarizes or describes an aspect of the population, such as
2279-413: The parameter based on a random sample of observations taken from the full population. Estimators of a set of parameters of a specific distribution are often measured for a population, under the assumption that the population is (at least approximately) distributed according to that specific probability distribution. In other situations, parameters may be fixed by the nature of the sampling procedure used or
2332-412: The parameter defining the width of the distribution, instead of the standard deviation σ {\textstyle \sigma } or the variance σ 2 {\textstyle \sigma ^{2}} . The precision is normally defined as the reciprocal of the variance, 1 / σ 2 {\textstyle 1/\sigma ^{2}} . The formula for
2385-454: The parameter which generated given distribution P 0 . Let P {\displaystyle {\mathcal {P}}} be the normal location-scale family : Then This expression is equal to zero for almost all x only when all its coefficients are equal to zero, which is only possible when | σ 1 | = | σ 2 | and μ 1 = μ 2 . Since in the scale parameter σ is restricted to be greater than zero, we conclude that
2438-473: The percentage of the sample of polled voters – will be measured instead. The statistic, along with an estimation of its accuracy (known as its sampling error ), is then used to make inferences about the true statistical parameters (the percentages of all voters). Similarly, in some forms of testing of manufactured products, rather than destructively testing all products, only a sample of products are tested. Such tests gather statistics supporting an inference that
2491-1207: The probability of a random variable, with normal distribution of mean 0 and variance 1/2 falling in the range [ − x , x ] {\textstyle [-x,x]} . That is: erf ( x ) = 1 π ∫ − x x e − t 2 d t = 2 π ∫ 0 x e − t 2 d t . {\displaystyle \operatorname {erf} (x)={\frac {1}{\sqrt {\pi }}}\int _{-x}^{x}e^{-t^{2}}\,dt={\frac {2}{\sqrt {\pi }}}\int _{0}^{x}e^{-t^{2}}\,dt\,.} These integrals cannot be expressed in terms of elementary functions, and are often said to be special functions . However, many numerical approximations are known; see below for more. The two functions are closely related, namely Φ ( x ) = 1 2 [ 1 + erf ( x 2 ) ] . {\displaystyle \Phi (x)={\frac {1}{2}}\left[1+\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)\right]\,.} For
2544-526: The product βσ² ∗ is (where σ² ∗ is the variance of the latent regressor x* ). This is also an example of a set identifiable model: although the exact value of β cannot be learned, we can guarantee that it must lie somewhere in the interval ( β yx , 1÷ β xy ), where β yx is the coefficient in OLS regression of y on x , and β xy is the coefficient in OLS regression of x on y . If we abandon
2597-445: The products meet specifications. Normal distribution I ( μ , σ ) = ( 1 / σ 2 0 0 2 / σ 2 ) {\displaystyle {\mathcal {I}}(\mu ,\sigma )={\begin{pmatrix}1/\sigma ^{2}&0\\0&2/\sigma ^{2}\end{pmatrix}}} In probability theory and statistics ,
2650-581: The standard normal cumulative distribution function, Q ( x ) = 1 − Φ ( x ) {\textstyle Q(x)=1-\Phi (x)} , is often called the Q-function , especially in engineering texts. It gives the probability that the value of a standard normal random variable X {\textstyle X} will exceed x {\textstyle x} : P ( X > x ) {\textstyle P(X>x)} . Other definitions of
2703-783: The standard normal distribution can be expanded by Integration by parts into a series: Φ ( x ) = 1 2 + 1 2 π ⋅ e − x 2 / 2 [ x + x 3 3 + x 5 3 ⋅ 5 + ⋯ + x 2 n + 1 ( 2 n + 1 ) ! ! + ⋯ ] . {\displaystyle \Phi (x)={\frac {1}{2}}+{\frac {1}{\sqrt {2\pi }}}\cdot e^{-x^{2}/2}\left[x+{\frac {x^{3}}{3}}+{\frac {x^{5}}{3\cdot 5}}+\cdots +{\frac {x^{2n+1}}{(2n+1)!!}}+\cdots \right]\,.} where ! ! {\textstyle !!} denotes
SECTION 50
#17327730336782756-600: The standard normal distribution, usually denoted with the capital Greek letter Φ {\textstyle \Phi } , is the integral Φ ( x ) = 1 2 π ∫ − ∞ x e − t 2 / 2 d t . {\displaystyle \Phi (x)={\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{x}e^{-t^{2}/2}\,dt\,.} The related error function erf ( x ) {\textstyle \operatorname {erf} (x)} gives
2809-520: The standard normal distribution. This variate is also called the standardized form of X {\textstyle X} . The probability density of the standard Gaussian distribution (standard normal distribution, with zero mean and unit variance) is often denoted with the Greek letter ϕ {\textstyle \phi } ( phi ). The alternative form of the Greek letter phi, φ {\textstyle \varphi } ,
#677322