In null-hypothesis significance testing , the p -value is the probability of obtaining test results at least as extreme as the result actually observed , under the assumption that the null hypothesis is correct. A very small p -value means that such an extreme observed outcome would be very unlikely under the null hypothesis . Even though reporting p -values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience .
144-499: In 2016, the American Statistical Association (ASA) made a formal statement that " p -values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p -value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis". That said,
288-410: A b f ( x ) d x . {\displaystyle P\left(a\leq X\leq b\right)=\int _{a}^{b}f(x)\,dx.} This is the definition of a probability density function , so that absolutely continuous probability distributions are exactly those with a probability density function. In particular, the probability for X {\displaystyle X} to take any single value
432-401: A {\displaystyle a} (that is, a ≤ X ≤ a {\displaystyle a\leq X\leq a} ) is zero, because an integral with coinciding upper and lower limits is always equal to zero. If the interval [ a , b ] {\displaystyle [a,b]} is replaced by any measurable set A {\displaystyle A} ,
576-481: A x 2 + b {\displaystyle \varphi (x)=ax^{2}+b} , where a > 0 . This also holds in the multidimensional case. Unlike the expected absolute deviation , the variance of a variable has units that are the square of the units of the variable itself. For example, a variable measured in meters will have a variance measured in meters squared. For this reason, describing data sets via their standard deviation or root mean square deviation
720-439: A , b ] ⊂ R {\displaystyle I=[a,b]\subset \mathbb {R} } the probability of X {\displaystyle X} belonging to I {\displaystyle I} is given by the integral of f {\displaystyle f} over I {\displaystyle I} : P ( a ≤ X ≤ b ) = ∫
864-446: A binomial distribution : In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p -value that the excess was a real, but unexplained, effect. The p -value was first formally introduced by Karl Pearson , in his Pearson's chi-squared test , using the chi-squared distribution and notated as capital P. The p -values for
1008-460: A measurable space ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} . Given that probabilities of events of the form { ω ∈ Ω ∣ X ( ω ) ∈ A } {\displaystyle \{\omega \in \Omega \mid X(\omega )\in A\}} satisfy Kolmogorov's probability axioms ,
1152-792: A random variable . The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion , meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution , and the covariance of the random variable with itself, and it is often represented by σ 2 {\displaystyle \sigma ^{2}} , s 2 {\displaystyle s^{2}} , Var ( X ) {\displaystyle \operatorname {Var} (X)} , V ( X ) {\displaystyle V(X)} , or V ( X ) {\displaystyle \mathbb {V} (X)} . An advantage of variance as
1296-426: A simple or point hypothesis refers to a hypothesis where the parameter's value is assumed to be a single number. In contrast, in a composite hypothesis the parameter's value is given by a set of numbers. When the null-hypothesis is composite (or the distribution of the statistic is discrete), then when the null-hypothesis is true the probability of obtaining a p -value less than or equal to any number between 0 and 1
1440-453: A 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: " p -values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data". In statistics, every conjecture concerning the unknown probability distribution of a collection of random variables representing the observed data X {\displaystyle X} in some study
1584-495: A Bernoulli distribution with parameter p {\displaystyle p} . This is a transformation of discrete random variable. For a distribution function F {\displaystyle F} of an absolutely continuous random variable, an absolutely continuous random variable must be constructed. F i n v {\displaystyle F^{\mathit {inv}}} , an inverse function of F {\displaystyle F} , relates to
SECTION 10
#17327942442641728-414: A coin toss, a roll of a die) and the probabilities are encoded by a discrete list of the probabilities of the outcomes; in this case the discrete probability distribution is known as probability mass function . On the other hand, absolutely continuous probability distributions are applicable to scenarios where the set of possible outcomes can take on values in a continuous range (e.g. real numbers), such as
1872-489: A constant, the variance is scaled by the square of that constant: The variance of a sum of two random variables is given by where Cov ( X , Y ) {\displaystyle \operatorname {Cov} (X,Y)} is the covariance . In general, for the sum of N {\displaystyle N} random variables { X 1 , … , X N } {\displaystyle \{X_{1},\dots ,X_{N}\}} ,
2016-447: A continuous function φ {\displaystyle \varphi } satisfies a r g m i n m E ( φ ( X − m ) ) = E ( X ) {\displaystyle \mathrm {argmin} _{m}\,\mathrm {E} (\varphi (X-m))=\mathrm {E} (X)} for all random variables X , then it is necessarily of the form φ ( x ) =
2160-1109: A discrete probability distribution, there is a countable set A {\displaystyle A} with P ( X ∈ A ) = 1 {\displaystyle P(X\in A)=1} and a probability mass function p {\displaystyle p} . If E {\displaystyle E} is any event, then P ( X ∈ E ) = ∑ ω ∈ A p ( ω ) δ ω ( E ) , {\displaystyle P(X\in E)=\sum _{\omega \in A}p(\omega )\delta _{\omega }(E),} or in short, P X = ∑ ω ∈ A p ( ω ) δ ω . {\displaystyle P_{X}=\sum _{\omega \in A}p(\omega )\delta _{\omega }.} Similarly, discrete distributions can be represented with
2304-1024: A discrete random variable X {\displaystyle X} , let u 0 , u 1 , … {\displaystyle u_{0},u_{1},\dots } be the values it can take with non-zero probability. Denote Ω i = X − 1 ( u i ) = { ω : X ( ω ) = u i } , i = 0 , 1 , 2 , … {\displaystyle \Omega _{i}=X^{-1}(u_{i})=\{\omega :X(\omega )=u_{i}\},\,i=0,1,2,\dots } These are disjoint sets , and for such sets P ( ⋃ i Ω i ) = ∑ i P ( Ω i ) = ∑ i P ( X = u i ) = 1. {\displaystyle P\left(\bigcup _{i}\Omega _{i}\right)=\sum _{i}P(\Omega _{i})=\sum _{i}P(X=u_{i})=1.} It follows that
2448-428: A distribution, then the sample variance calculated from that infinite set will match the value calculated using the distribution's equation for variance. Variance has a central role in statistics, where some ideas that use it include descriptive statistics , statistical inference , hypothesis testing , goodness of fit , and Monte Carlo sampling . The variance of a random variable X {\displaystyle X}
2592-409: A measure of dispersion is that it is more amenable to algebraic manipulation than other measures of dispersion such as the expected absolute deviation ; for example, the variance of a sum of uncorrelated random variables is equal to the sum of their variances. A disadvantage of the variance for practical applications is that, unlike the standard deviation, its units differ from the random variable, which
2736-428: A model (the null hypothesis ) and the alpha level α (most commonly 0.05). After analyzing the data, if the p -value is less than α , that is taken to mean that the observed data is sufficiently inconsistent with the null hypothesis for the null hypothesis to be rejected. However, that does not prove that the null hypothesis is false. The p -value does not, in itself, establish probabilities of hypotheses. Rather, it
2880-467: A more general definition of density functions and the equivalent absolutely continuous measures see absolutely continuous measure . In the measure-theoretic formalization of probability theory , a random variable is defined as a measurable function X {\displaystyle X} from a probability space ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )} to
3024-418: A multivariate distribution (a joint probability distribution ) gives the probabilities of a random vector – a list of two or more random variables – taking on various combinations of values. Important and commonly encountered univariate probability distributions include the binomial distribution , the hypergeometric distribution , and the normal distribution . A commonly encountered multivariate distribution
SECTION 20
#17327942442643168-534: A myriad of phenomena, since most practical distributions are supported on relatively simple subsets, such as hypercubes or balls . However, this is not always the case, and there exist phenomena with supports that are actually complicated curves γ : [ a , b ] → R n {\displaystyle \gamma :[a,b]\rightarrow \mathbb {R} ^{n}} within some space R n {\displaystyle \mathbb {R} ^{n}} or similar. In these cases,
3312-433: A number in [ 0 , 1 ] ⊆ R {\displaystyle [0,1]\subseteq \mathbb {R} } . The probability function P {\displaystyle P} can take as argument subsets of the sample space itself, as in the coin toss example, where the function P {\displaystyle P} was defined so that P (heads) = 0.5 and P (tails) = 0.5 . However, because of
3456-413: A practising statistician would consider the more important to avoid (which is a subjective judgment) is called the error of the first kind. The first demand of the mathematical theory is to deduce such test criteria as would ensure that the probability of committing an error of the first kind would equal (or approximately equal, or not exceed) a preassigned number α, such as α = 0.05 or 0.01, etc. This number
3600-421: A random variable takes values from a continuum then by convention, any individual outcome is assigned probability zero. For such continuous random variables , only events that include infinitely many outcomes such as intervals have probability greater than 0. For example, consider measuring the weight of a piece of ham in the supermarket, and assume the scale can provide arbitrarily many digits of precision. Then,
3744-429: A right-tailed test is considered, which would be the case if one is actually interested in the possibility that the coin is biased towards falling heads, then the p -value of this result is the chance of a fair coin landing on heads at least 14 times out of 20 flips. That probability can be computed from binomial coefficients as This probability is the p -value, considering only extreme results that favor heads. This
3888-401: A role in multiple testing . First, it corresponds to a generic, more robust alternative to the p-value that can deal with optional continuation of experiments. Second, it is also used to abbreviate "expect value", which is the expected number of times that one expects to obtain a test statistic at least as extreme as the one that was actually observed if one assumes that the null hypothesis
4032-408: A sample is considered an estimate of the full population variance. There are multiple ways to calculate an estimate of the population variance, as discussed in the section below. The two kinds of variance are closely related. To see how, consider that a theoretical probability distribution can be used as a generator of hypothetical observations. If an infinite number of observations are generated using
4176-473: A set of n {\displaystyle n} equally likely values can be equivalently expressed, without directly referring to the mean, in terms of squared deviations of all pairwise squared distances of points from each other: If the random variable X {\displaystyle X} has a probability density function f ( x ) {\displaystyle f(x)} , and F ( x ) {\displaystyle F(x)}
4320-402: A set of observations. When variance is calculated from observations, those observations are typically measured from a real-world system. If all possible observations of the system are present, then the calculated variance is called the population variance. Normally, however, only a subset is available, and the variance calculated from this is called the sample variance. The variance calculated from
4464-425: A set of probability zero, where 1 A {\displaystyle 1_{A}} is the indicator function of A {\displaystyle A} . This may serve as an alternative definition of discrete random variables. A special case is the discrete distribution of a random variable that can take on only one fixed value; in other words, it is a deterministic distribution . Expressed formally,
p-value - Misplaced Pages Continue
4608-400: A sine, sin ( t ) {\displaystyle \sin(t)} , whose limit when t → ∞ {\displaystyle t\rightarrow \infty } does not converge. Formally, the measure exists only if the limit of the relative frequency converges when the system is observed into the infinite future. The branch of dynamical systems that studies
4752-533: A single p -value relating to a hypothesis is observed, so the p -value is interpreted by a significance test, and no effort is made to estimate the distribution it was drawn from. When a collection of p -values are available (e.g. when considering a group of studies on the same subject), the distribution of p -values is sometimes called a p -curve. A p -curve can be used to assess the reliability of scientific literature, such as by detecting publication bias or p -hacking . In parametric hypothesis testing problems,
4896-516: A single number, such as a t -statistic or an F -statistic . As such, the test statistic follows a distribution determined by the function used to define that test statistic and the distribution of the input observational data. For the important case in which the data are hypothesized to be a random sample from a normal distribution, depending on the nature of the test statistic and the hypotheses of interest about its distribution, different null hypothesis tests have been developed. Some such tests are
5040-458: A table of p -values, Fisher instead inverted the CDF, publishing a list of values of the test statistic for given fixed p -values; this corresponds to computing the quantile function (inverse CDF). As an example of a statistical test, an experiment is performed to determine whether a coin flip is fair (equal chance of landing heads or tails) or unfairly biased (one outcome being more likely than
5184-778: A uniform distribution between 0 and 1. To construct a random Bernoulli variable for some 0 < p < 1 {\displaystyle 0<p<1} , we define X = { 1 , if U < p 0 , if U ≥ p {\displaystyle X={\begin{cases}1,&{\text{if }}U<p\\0,&{\text{if }}U\geq p\end{cases}}} so that Pr ( X = 1 ) = Pr ( U < p ) = p , Pr ( X = 0 ) = Pr ( U ≥ p ) = 1 − p . {\displaystyle \Pr(X=1)=\Pr(U<p)=p,\quad \Pr(X=0)=\Pr(U\geq p)=1-p.} This random variable X has
5328-425: Is 0.115. In the above example: The Pr(no. of heads ≤ 14 heads) = 1 − Pr(no. of heads ≥ 14 heads) + Pr(no. of head = 14) = 1 − 0.058 + 0.036 = 0.978; however, the symmetry of this binomial distribution makes it an unnecessary computation to find the smaller of the two probabilities. Here, the calculated p -value exceeds 0.05, meaning that the data falls within the range of what would happen 95% of
5472-403: Is a countable set with P ( X ∈ A ) = 1 {\displaystyle P(X\in A)=1} . Thus the discrete random variables (i.e. random variables whose probability distribution is discrete) are exactly those with a probability mass function p ( x ) = P ( X = x ) {\displaystyle p(x)=P(X=x)} . In the case where
5616-416: Is a discrete random variable assuming possible values y 1 , y 2 , y 3 … {\displaystyle y_{1},y_{2},y_{3}\ldots } with corresponding probabilities p 1 , p 2 , p 3 … , {\displaystyle p_{1},p_{2},p_{3}\ldots ,} , then in the formula for total variance,
5760-467: Is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events ( subsets of the sample space). For instance, if X is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of X would take the value 0.5 (1 in 2 or 1/2) for X = heads , and 0.5 for X = tails (assuming that the coin is fair ). More commonly, probability distributions are used to compare
5904-553: Is a probability distribution on the real numbers with uncountably many possible values, such as a whole interval in the real line, and where the probability of any event can be expressed as an integral. More precisely, a real random variable X {\displaystyle X} has an absolutely continuous probability distribution if there is a function f : R → [ 0 , ∞ ] {\displaystyle f:\mathbb {R} \to [0,\infty ]} such that for each interval I = [
p-value - Misplaced Pages Continue
6048-450: Is a tool for deciding whether to reject the null hypothesis. According to the ASA , there is widespread agreement that p -values are often misused and misinterpreted. One practice that has been particularly criticized is accepting the alternative hypothesis for any p -value nominally less than 0.05 without other supporting evidence. Although p -values are helpful in assessing how incompatible
6192-443: Is based on normal approximations to appropriate statistics obtained by invoking the central limit theorem for large samples, as in the case of Pearson's chi-squared test . Thus computing a p -value requires a null hypothesis, a test statistic (together with deciding whether the researcher is performing a one-tailed test or a two-tailed test ), and data. Even though computing the test statistic on given data may be easy, computing
6336-414: Is called a one-tailed test . However, one might be interested in deviations in either direction, favoring either heads or tails. The two-tailed p -value, which considers deviations favoring either heads or tails, may instead be calculated. As the binomial distribution is symmetrical for a fair coin, the two-sided p -value is simply twice the above calculated single-sided p -value: the two-sided p -value
6480-458: Is called a statistical hypothesis . If we state one hypothesis only and the aim of the statistical test is to see whether this hypothesis is tenable, but not to investigate other specific hypotheses, then such a test is called a null hypothesis test . As our statistical hypothesis will, by definition, state some property of the distribution, the null hypothesis is the default hypothesis under which that property does not exist. The null hypothesis
6624-407: Is called the level of significance. In a significance test, the null hypothesis H 0 {\displaystyle H_{0}} is rejected if the p -value is less than or equal to a predefined threshold value α {\displaystyle \alpha } , which is referred to as the alpha level or significance level . α {\displaystyle \alpha }
6768-423: Is closely connected to a main question of interest in the study. The p -value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic T {\displaystyle T} . The lower the p -value is, the lower the probability of getting that result if the null hypothesis were true. A result
6912-560: Is common to denote as P ( X ∈ E ) {\displaystyle P(X\in E)} the probability that a certain value of the variable X {\displaystyle X} belongs to a certain event E {\displaystyle E} . The above probability function only characterizes a probability distribution if it satisfies all the Kolmogorov axioms , that is: The concept of probability function
7056-409: Is credited as "… the first use of significance tests …" the first example of reasoning about statistical significance, and "… perhaps the first published report of a nonparametric test …", specifically the sign test ; see details at Sign test § History . The same question was later addressed by Pierre-Simon Laplace , who instead used a parametric test, modeling the number of male births with
7200-401: Is defined as F ( x ) = P ( X ≤ x ) . {\displaystyle F(x)=P(X\leq x).} The cumulative distribution function of any real-valued random variable has the properties: Conversely, any function F : R → R {\displaystyle F:\mathbb {R} \to \mathbb {R} } that satisfies the first four of
7344-552: Is given by on the interval [0, ∞) . Its mean can be shown to be Using integration by parts and making use of the expected value already calculated, we have: Thus, the variance of X is given by A fair six-sided die can be modeled as a discrete random variable, X , with outcomes 1 through 6, each with equal probability 1/6. The expected value of X is ( 1 + 2 + 3 + 4 + 5 + 6 ) / 6 = 7 / 2. {\displaystyle (1+2+3+4+5+6)/6=7/2.} Therefore,
SECTION 50
#17327942442647488-406: Is heated debate on the feasibility of these alternatives. Others have suggested to remove fixed significance thresholds and to interpret p -values as continuous indices of the strength of evidence against the null hypothesis. Yet others suggested to report alongside p -values the prior probability of a real effect that would be required to obtain a false positive risk (i.e. the probability that there
7632-508: Is made more rigorous by defining it as the element of a probability space ( X , A , P ) {\displaystyle (X,{\mathcal {A}},P)} , where X {\displaystyle X} is the set of possible outcomes, A {\displaystyle {\mathcal {A}}} is the set of all subsets E ⊂ X {\displaystyle E\subset X} whose probability can be measured, and P {\displaystyle P}
7776-510: Is no real effect) below a pre-specified threshold (e.g. 5%). That said, in 2019 a task force by ASA had convened to consider the use of statistical methods in scientific studies, specifically hypothesis tests and p -values, and their connection to replicability. It states that "Different measures of uncertainty can complement one another; no single measure serves all purposes", citing p -value as one of these measures. They also stress that p -values can provide valuable information when considering
7920-454: Is not derived from the data, but rather is set by the researcher before examining the data. α {\displaystyle \alpha } is commonly set to 0.05, though lower alpha levels are sometimes used. The 0.05 value (equivalent to 1/20 chances) was originally proposed by R. Fisher in 1925 in his famous book entitled " Statistical Methods for Research Workers ". In 2018, a group of statisticians led by Daniel Benjamin proposed
8064-425: Is not normally distributed. Different tests of the same null hypothesis would be more or less sensitive to different alternatives. However, even if we do manage to reject the null hypothesis for all 3 alternatives, and even if we know that the distribution is normal and variance is 1, the null hypothesis test does not tell us which non-zero values of the mean are now most plausible. The more independent observations from
8208-478: Is often preferred over using the variance. In the dice example the standard deviation is √ 2.9 ≈ 1.7 , slightly larger than the expected absolute deviation of 1.5. The standard deviation and the expected absolute deviation can both be used as an indicator of the "spread" of a distribution. The standard deviation is more amenable to algebraic manipulation than the expected absolute deviation, and, together with variance and its generalization covariance ,
8352-462: Is possible because this measurement does not require as much precision from the underlying equipment. Absolutely continuous probability distributions can be described in several ways. The probability density function describes the infinitesimal probability of any given value, and the probability that the outcome lies in a given interval can be computed by integrating the probability density function over that interval. An alternative description of
8496-446: Is said to be statistically significant if it allows us to reject the null hypothesis. All other things being equal, smaller p -values are taken as stronger evidence against the null hypothesis. Loosely speaking, rejection of the null hypothesis implies that there is sufficient evidence against it. As a particular example, if a null hypothesis states that a certain summary statistic T {\displaystyle T} follows
8640-432: Is still less than or equal to that number. In other words, it remains the case that very small values are relatively unlikely if the null-hypothesis is true, and that a significance test at level α {\displaystyle \alpha } is obtained by rejecting the null-hypothesis if the p -value is less than or equal to α {\displaystyle \alpha } . For example, when testing
8784-420: Is studied, the observed states from the subset are as indicated in red. So one could ask what is the probability of observing a state in a certain position of the red subset; if such a probability exists, it is called the probability measure of the system. This kind of complicated support appears quite frequently in dynamical systems . It is not simple to establish that the system has a probability measure, and
SECTION 60
#17327942442648928-406: Is the expected value of the squared deviation from the mean of X {\displaystyle X} , μ = E [ X ] {\displaystyle \mu =\operatorname {E} [X]} : This definition encompasses random variables that are generated by processes that are discrete , continuous , neither , or mixed. The variance can also be thought of as
9072-424: Is the multivariate normal distribution . Besides the probability function, the cumulative distribution function, the probability mass function and the probability density function, the moment generating function and the characteristic function also serve to identify a probability distribution, as they uniquely determine an underlying cumulative distribution function. Some key concepts and terms, widely used in
9216-412: Is the set of all possible outcomes of a random phenomenon being observed. The sample space may be any set: a set of real numbers , a set of descriptive labels, a set of vectors , a set of arbitrary non-numerical values, etc. For example, the sample space of a coin flip could be Ω = { "heads", "tails" } . To define probability distributions for the specific case of random variables (so
9360-406: Is the area under the probability density function from − ∞ {\displaystyle \ -\infty \ } to x , {\displaystyle \ x\ ,} as shown in figure 1. A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of
9504-472: Is the corresponding cumulative distribution function , then or equivalently, where μ {\displaystyle \mu } is the expected value of X {\displaystyle X} given by In these formulas, the integrals with respect to d x {\displaystyle dx} and d F ( x ) {\displaystyle dF(x)} are Lebesgue and Lebesgue–Stieltjes integrals, respectively. If
9648-403: Is the expected value. That is, (When such a discrete weighted variance is specified by weights whose sum is not 1, then one divides by the sum of the weights.) The variance of a collection of n {\displaystyle n} equally likely values can be written as where μ {\displaystyle \mu } is the average value. That is, The variance of
9792-425: Is the only parameter), and if that distribution is continuous, then when the null-hypothesis is true, the p -value is uniformly distributed between 0 and 1. Regardless of the truth of the H 0 {\displaystyle H_{0}} , the p -value is not fixed; if the same test is repeated independently with fresh data, one will typically obtain a different p -value in each iteration. Usually only
9936-632: Is the probability distribution of a random variable that can take on only a countable number of values ( almost surely ) which means that the probability of any event E {\displaystyle E} can be expressed as a (finite or countably infinite ) sum: P ( X ∈ E ) = ∑ ω ∈ A ∩ E P ( X = ω ) , {\displaystyle P(X\in E)=\sum _{\omega \in A\cap E}P(X=\omega ),} where A {\displaystyle A}
10080-399: Is the probability function, or probability measure , that assigns a probability to each of these measurable subsets E ∈ A {\displaystyle E\in {\mathcal {A}}} . Probability distributions usually belong to one of two classes. A discrete probability distribution is applicable to the scenarios where the set of possible outcomes is discrete (e.g.
10224-473: Is true. This expect-value is the product of the number of tests and the p -value. The q -value is the analog of the p -value with respect to the positive false discovery rate . It is used in multiple hypothesis testing to maintain statistical power while minimizing the false positive rate . The Probability of Direction ( pd ) is the Bayesian numerical equivalent of the p -value. It corresponds to
10368-469: Is typically that some parameter (such as a correlation or a difference between means) in the populations of interest is zero. Our hypothesis might specify the probability distribution of X {\displaystyle X} precisely, or it might only specify that it belongs to some class of distributions. Often, we reduce the data to a single numerical statistic, e.g., T {\displaystyle T} , whose marginal probability distribution
10512-426: Is used frequently in theoretical statistics; however the expected absolute deviation tends to be more robust as it is less sensitive to outliers arising from measurement anomalies or an unduly heavy-tailed distribution . Variance is invariant with respect to changes in a location parameter . That is, if a constant is added to all values of the variable, the variance is unchanged: If all values are scaled by
10656-413: Is why the standard deviation is more commonly reported as a measure of dispersion once the calculation is finished. Another disadvantage is that the variance is not finite for many distributions. There are two distinct concepts that are both called "variance". One, as discussed above, is part of a theoretical probability distribution and is defined by an equation. The other variance is a characteristic of
10800-521: The Z -statistic belonging to the one-sided one-sample Z -test. For each possible value of the theoretical mean, the Z -test statistic has a different probability distribution. In these circumstances the p -value is defined by taking the least favorable null-hypothesis case, which is typically on the border between null and alternative. This definition ensures the complementarity of p-values and alpha-levels: α = 0.05 {\displaystyle \alpha =0.05} means one only rejects
10944-517: The z -test for hypotheses concerning the mean of a normal distribution with known variance, the t -test based on Student's t -distribution of a suitable statistic for hypotheses concerning the mean of a normal distribution when the variance is unknown, the F -test based on the F -distribution of yet another statistic for hypotheses concerning the variance. For data of other nature, for instance, categorical (discrete) data, test statistics might be constructed whose null hypothesis distribution
11088-1175: The Dirac delta function as a generalized probability density function f {\displaystyle f} , where f ( x ) = ∑ ω ∈ A p ( ω ) δ ( x − ω ) , {\displaystyle f(x)=\sum _{\omega \in A}p(\omega )\delta (x-\omega ),} which means P ( X ∈ E ) = ∫ E f ( x ) d x = ∑ ω ∈ A p ( ω ) ∫ E δ ( x − ω ) = ∑ ω ∈ A ∩ E p ( ω ) {\displaystyle P(X\in E)=\int _{E}f(x)\,dx=\sum _{\omega \in A}p(\omega )\int _{E}\delta (x-\omega )=\sum _{\omega \in A\cap E}p(\omega )} for any event E . {\displaystyle E.} For
11232-520: The Poisson distribution , the Bernoulli distribution , the binomial distribution , the geometric distribution , the negative binomial distribution and categorical distribution . When a sample (a set of observations) is drawn from a larger population, the sample points have an empirical distribution that is discrete, and which provides information about the population distribution. Additionally,
11376-457: The chi-squared distribution (for various values of χ and degrees of freedom), now notated as P, were calculated in ( Elderton 1902 ), collected in ( Pearson 1914 , pp. xxxi–xxxiii, 26–28, Table XII). Ronald Fisher formalized and popularized the use of the p -value in statistics, with it playing a central role in his approach to the subject. In his highly influential book Statistical Methods for Research Workers (1925), Fisher proposed
11520-419: The conditional variance Var ( X ∣ Y ) {\displaystyle \operatorname {Var} (X\mid Y)} may be understood as follows. Given any particular value y of the random variable Y , there is a conditional expectation E ( X ∣ Y = y ) {\displaystyle \operatorname {E} (X\mid Y=y)} given
11664-750: The covariance of a random variable with itself: The variance is also equivalent to the second cumulant of a probability distribution that generates X {\displaystyle X} . The variance is typically designated as Var ( X ) {\displaystyle \operatorname {Var} (X)} , or sometimes as V ( X ) {\displaystyle V(X)} or V ( X ) {\displaystyle \mathbb {V} (X)} , or symbolically as σ X 2 {\displaystyle \sigma _{X}^{2}} or simply σ 2 {\displaystyle \sigma ^{2}} (pronounced " sigma squared"). The expression for
11808-473: The discrete uniform distribution is commonly used in computer programs that make equal-probability random selections between a number of choices. A real-valued discrete random variable can equivalently be defined as a random variable whose cumulative distribution function increases only by jump discontinuities —that is, its cdf increases only where it "jumps" to a higher value, and is constant in intervals without jumps. The points where jumps occur are precisely
11952-400: The half-open interval [0, 1) . These random variates X {\displaystyle X} are then transformed via some algorithm to create a new random variate having the required probability distribution. With this source of uniform pseudo-randomness, realizations of any random variable can be generated. For example, suppose U {\displaystyle U} has
12096-483: The human sex ratio at birth, and used to compute statistical significance compared to the null hypothesis of equal probability of male and female births. John Arbuthnot studied this question in 1710, and examined birth records in London for each of the 82 years from 1629 to 1710. In every year, the number of males born in London exceeded the number of females. Considering more male or more female births as equally likely,
12240-451: The lady tasting tea experiment, which is the archetypal example of the p -value. To evaluate a lady's claim that she ( Muriel Bristol ) could distinguish by taste how tea is prepared (first adding the milk to the cup, then the tea, or first tea, then milk), she was sequentially presented with 8 cups: 4 prepared one way, 4 prepared the other, and asked to determine the preparation of each cup (knowing that there were 4 of each). In that case,
12384-491: The law of total variance is: If X {\displaystyle X} and Y {\displaystyle Y} are two random variables, and the variance of X {\displaystyle X} exists, then The conditional expectation E ( X ∣ Y ) {\displaystyle \operatorname {E} (X\mid Y)} of X {\displaystyle X} given Y {\displaystyle Y} , and
12528-820: The probability distribution of X {\displaystyle X} is the image measure X ∗ P {\displaystyle X_{*}\mathbb {P} } of X {\displaystyle X} , which is a probability measure on ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} satisfying X ∗ P = P X − 1 {\displaystyle X_{*}\mathbb {P} =\mathbb {P} X^{-1}} . Absolutely continuous and discrete distributions with support on R k {\displaystyle \mathbb {R} ^{k}} or N k {\displaystyle \mathbb {N} ^{k}} are extremely useful to model
12672-681: The according equality still holds: P ( X ∈ A ) = ∫ A f ( x ) d x . {\displaystyle P(X\in A)=\int _{A}f(x)\,dx.} An absolutely continuous random variable is a random variable whose probability distribution is absolutely continuous. There are many examples of absolutely continuous probability distributions: normal , uniform , chi-squared , and others . Absolutely continuous probability distributions as defined above are precisely those with an absolutely continuous cumulative distribution function. In this case,
12816-400: The actual experiment, Bristol correctly classified all 8 cups.) Fisher reiterated the p = 0.05 threshold and explained its rationale, stating: It is usual and convenient for experimenters to take 5 per cent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard, and, by this means, to eliminate from further discussion
12960-682: The adoption of the 0.005 value as standard value for statistical significance worldwide. Different p -values based on independent sets of data can be combined, for instance using Fisher's combined probability test . The p -value is a function of the chosen test statistic T {\displaystyle T} and is therefore a random variable . If the null hypothesis fixes the probability distribution of T {\displaystyle T} precisely (e.g. H 0 : θ = θ 0 , {\displaystyle H_{0}:\theta =\theta _{0},} where θ {\displaystyle \theta }
13104-409: The coin 6 times no matter what happens, then the second definition of p -value would mean that the p -value of "3 heads 3 tails" is exactly 1. Thus, the "at least as extreme" definition of p -value is deeply contextual and depends on what the experimenter planned to do even in situations that did not occur. P -value computations date back to the 1700s, where they were computed for
13248-402: The cumulative distribution function F {\displaystyle F} has the form F ( x ) = P ( X ≤ x ) = ∫ − ∞ x f ( t ) d t {\displaystyle F(x)=P(X\leq x)=\int _{-\infty }^{x}f(t)\,dt} where f {\displaystyle f} is a density of
13392-588: The data are with a specified statistical model, contextual factors must also be considered, such as "the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis". Another concern is that the p -value is often misunderstood as being the probability that the null hypothesis is true. Some statisticians have proposed abandoning p -values and focusing more on other inferential statistics, such as confidence intervals , likelihood ratios , or Bayes factors , but there
13536-427: The distribution is by means of the cumulative distribution function , which describes the probability that the random variable is no larger than a given value (i.e., P ( X < x ) {\displaystyle \ {\boldsymbol {\mathcal {P}}}(X<x)\ } for some x {\displaystyle \ x\ } ). The cumulative distribution function
13680-568: The event Y = y . This quantity depends on the particular value y ; it is a function g ( y ) = E ( X ∣ Y = y ) {\displaystyle g(y)=\operatorname {E} (X\mid Y=y)} . That same function evaluated at the random variable Y is the conditional expectation E ( X ∣ Y ) = g ( Y ) . {\displaystyle \operatorname {E} (X\mid Y)=g(Y).} In particular, if Y {\displaystyle Y}
13824-423: The exact p -value can be used, and the strength of evidence can and will be revised with further experimentation. In contrast, decision procedures require a clear-cut decision, yielding an irreversible action, and the procedure is based on costs of error, which, he argues, are inapplicable to scientific research. The E-value can refer to two concepts, both of which are related to the p-value and both of which play
13968-439: The existence of a probability measure is ergodic theory . Note that even in these cases, the probability distribution, if it exists, might still be termed "absolutely continuous" or "discrete" depending on whether the support is uncountable or countable, respectively. Most algorithms are based on a pseudorandom number generator that produces numbers X {\displaystyle X} that are uniformly distributed in
14112-490: The fairness of the coin. In general, optional stopping changes how p-value is calculated. Suppose we design the experiment as follows: This experiment has 7 types of outcomes: 2 heads, 2 tails, 5 heads 1 tail, ..., 1 head 5 tails. We now calculate the p -value of the "3 heads 3 tails" outcome. If we use the test statistic heads / tails {\displaystyle {\text{heads}}/{\text{tails}}} , then under
14256-679: The first term on the right-hand side becomes where σ i 2 = Var [ X ∣ Y = y i ] {\displaystyle \sigma _{i}^{2}=\operatorname {Var} [X\mid Y=y_{i}]} . Similarly, the second term on the right-hand side becomes where μ i = E [ X ∣ Y = y i ] {\displaystyle \mu _{i}=\operatorname {E} [X\mid Y=y_{i}]} and μ = ∑ i p i μ i {\displaystyle \mu =\sum _{i}p_{i}\mu _{i}} . Thus
14400-449: The function x 2 f ( x ) {\displaystyle x^{2}f(x)} is Riemann-integrable on every finite interval [ a , b ] ⊂ R , {\displaystyle [a,b]\subset \mathbb {R} ,} then where the integral is an improper Riemann integral . The exponential distribution with parameter λ is a continuous distribution whose probability density function
14544-467: The generator of random variable X {\displaystyle X} is discrete with probability mass function x 1 ↦ p 1 , x 2 ↦ p 2 , … , x n ↦ p n {\displaystyle x_{1}\mapsto p_{1},x_{2}\mapsto p_{2},\ldots ,x_{n}\mapsto p_{n}} , then where μ {\displaystyle \mu }
14688-528: The greater part of the fluctuations which chance causes have introduced into their experimental results. He also applies this threshold to the design of experiments, noting that had only 6 cups been presented (3 of each), a perfect classification would have only yielded a p -value of 1 / ( 6 3 ) = 1 / 20 = 0.05 , {\displaystyle 1/{\binom {6}{3}}=1/20=0.05,} which would not have met this level of significance. Fisher also underlined
14832-456: The interpretation of p, as the long-run proportion of values at least as extreme as the data, assuming the null hypothesis is true. In later editions, Fisher explicitly contrasted the use of the p -value for statistical inference in science with the Neyman–Pearson method, which he terms "Acceptance Procedures". Fisher emphasizes that while fixed levels such as 5%, 2%, and 1% are convenient,
14976-510: The inverse is not true, there exist singular distributions , which are neither absolutely continuous nor discrete nor a mixture of those, and do not have a density. An example is given by the Cantor distribution . Some authors however use the term "continuous distribution" to denote all distributions whose cumulative distribution function is absolutely continuous , i.e. refer to absolutely continuous distributions as continuous distributions. For
15120-742: The level p = 0.05, or a 1 in 20 chance of being exceeded by chance, as a limit for statistical significance , and applied this to a normal distribution (as a two-tailed test), thus yielding the rule of two standard deviations (on a normal distribution) for statistical significance (see 68–95–99.7 rule ). He then computed a table of values, similar to Elderton but, importantly, reversed the roles of χ and p. That is, rather than computing p for different values of χ (and degrees of freedom n ), he computed values of χ that yield specified p -values, specifically 0.99, 0.98, 0.95, 0,90, 0.80, 0.70, 0.50, 0.30, 0.20, 0.10, 0.05, 0.02, and 0.01. That allowed computed values of χ to be compared against cutoffs and encouraged
15264-449: The literature on the topic of probability distributions, are listed below. In the special case of a real-valued random variable, the probability distribution can equivalently be represented by a cumulative distribution function instead of a probability measure. The cumulative distribution function of a random variable X {\displaystyle X} with regard to a probability distribution p {\displaystyle p}
15408-741: The main problem is the following. Let t 1 ≪ t 2 ≪ t 3 {\displaystyle t_{1}\ll t_{2}\ll t_{3}} be instants in time and O {\displaystyle O} a subset of the support; if the probability measure exists for the system, one would expect the frequency of observing states inside set O {\displaystyle O} would be equal in interval [ t 1 , t 2 ] {\displaystyle [t_{1},t_{2}]} and [ t 2 , t 3 ] {\displaystyle [t_{2},t_{3}]} , which might not happen; for example, it could oscillate similar to
15552-437: The most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function P : A → R {\displaystyle P\colon {\mathcal {A}}\to \mathbb {R} } whose input space A {\displaystyle {\mathcal {A}}} is a σ-algebra , and gives a real number probability as its output, particularly,
15696-426: The null hypothesis if the p -value is less than or equal to 0.05 {\displaystyle 0.05} , and the hypothesis test will indeed have a maximum type-1 error rate of 0.05 {\displaystyle 0.05} . The p -value is widely used in statistical hypothesis testing , specifically in null hypothesis significance testing. In this method, before conducting the study, one first chooses
15840-485: The null hypothesis is exactly 1 for two-sided p -value, and exactly 19 / 32 {\displaystyle 19/32} for one-sided left-tail p -value, and same for one-sided right-tail p -value. If we consider every outcome that has equal or lower probability than "3 heads 3 tails" as "at least as extreme", then the p -value is exactly 1 / 2. {\displaystyle 1/2.} However, suppose we have planned to simply flip
15984-413: The null hypothesis that a distribution is normal with a mean less than or equal to zero against the alternative that the mean is greater than zero ( H 0 : μ ≤ 0 {\displaystyle H_{0}:\mu \leq 0} , variance known), the null hypothesis does not specify the exact probability distribution of the appropriate test statistic. In this example that would be
16128-445: The null hypothesis was that she had no special ability, the test was Fisher's exact test , and the p -value was 1 / ( 8 4 ) = 1 / 70 ≈ 0.014 , {\displaystyle 1/{\binom {8}{4}}=1/70\approx 0.014,} so Fisher was willing to reject the null hypothesis (consider the outcome highly unlikely to be due to chance) if all were classified correctly. (In
16272-859: The number of dots on the die, has the probability 1 6 ) . {\displaystyle \ {\tfrac {1}{6}}~).} The probability of an event is then defined to be the sum of the probabilities of all outcomes that satisfy the event; for example, the probability of the event "the die rolls an even value" is p ( “ 2 ” ) + p ( “ 4 ” ) + p ( “ 6 ” ) = 1 6 + 1 6 + 1 6 = 1 2 . {\displaystyle \ p({\text{“}}2{\text{”}})+p({\text{“}}4{\text{”}})+p({\text{“}}6{\text{”}})={\tfrac {1}{6}}+{\tfrac {1}{6}}+{\tfrac {1}{6}}={\tfrac {1}{2}}~.} In contrast, when
16416-502: The one obtained. Consider an observed test-statistic t {\displaystyle t} from unknown distribution T {\displaystyle T} . Then the p -value p {\displaystyle p} is what the prior probability would be of observing a test-statistic value at least as "extreme" as t {\displaystyle t} if null hypothesis H 0 {\displaystyle H_{0}} were true. That is: The error that
16560-451: The other). Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The full data X {\displaystyle X} would be a sequence of twenty times the symbol "H" or "T". The statistic on which one might focus could be the total number T {\displaystyle T} of heads. The null hypothesis is that the coin is fair, and coin tosses are independent of one another. If
16704-424: The predicted score and the error score, where the latter two are uncorrelated. Similar decompositions are possible for the sum of squared deviations (sum of squares, S S {\displaystyle {\mathit {SS}}} ): The population variance for a non-negative random variable can be expressed in terms of the cumulative distribution function F using This expression can be used to calculate
16848-493: The probability distribution is supported on the image of such curve, and is likely to be determined empirically, rather than finding a closed formula for it. One example is shown in the figure to the right, which displays the evolution of a system of differential equations (commonly known as the Rabinovich–Fabrikant equations ) that can be used to model the behaviour of Langmuir waves in plasma . When this phenomenon
16992-470: The probability of the observed outcome is 1/2, or about 1 in 4,836,000,000,000,000,000,000,000; in modern terms, the p -value. This is vanishingly small, leading Arbuthnot that this was not due to chance, but to divine providence: "From whence it follows, that it is Art, not Chance, that governs." In modern terms, he rejected the null hypothesis of equally likely male and female births at the p = 1/2 significance level. This and other work by Arbuthnot
17136-533: The probability that X {\displaystyle X} takes any value except for u 0 , u 1 , … {\displaystyle u_{0},u_{1},\dots } is zero, and thus one can write X {\displaystyle X} as X ( ω ) = ∑ i u i 1 Ω i ( ω ) {\displaystyle X(\omega )=\sum _{i}u_{i}1_{\Omega _{i}}(\omega )} except on
17280-458: The probability that it weighs exactly 500 g must be zero because no matter how high the level of precision chosen, it cannot be assumed that there are no non-zero decimal digits in the remaining omitted digits ignored by the precision level. However, for the same use case, it is possible to meet quality control requirements such as that a package of "500 g" of ham must weigh between 490 g and 510 g with at least 98% probability. This
17424-454: The properties above is the cumulative distribution function of some probability distribution on the real numbers. Any probability distribution can be decomposed as the mixture of a discrete , an absolutely continuous and a singular continuous distribution , and thus any cumulative distribution function admits a decomposition as the convex sum of the three according cumulative distribution functions. A discrete probability distribution
17568-424: The proportion of the posterior distribution that is of the median's sign, typically varying between 50% and 100%, and representing the certainty with which an effect is positive or negative. Probability distribution In probability theory and statistics , a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment . It
17712-450: The random variable X {\displaystyle X} has a one-point distribution if it has a possible outcome x {\displaystyle x} such that P ( X = x ) = 1. {\displaystyle P(X{=}x)=1.} All other possible outcomes then have probability 0. Its cumulative distribution function jumps immediately from 0 to 1. An absolutely continuous probability distribution
17856-408: The random variable X {\displaystyle X} with regard to the distribution P {\displaystyle P} . Note on terminology: Absolutely continuous distributions ought to be distinguished from continuous distributions , which are those having a continuous cumulative distribution function. Every absolutely continuous distribution is a continuous distribution but
18000-643: The range of values is countably infinite, these values have to decline to zero fast enough for the probabilities to add up to 1. For example, if p ( n ) = 1 2 n {\displaystyle p(n)={\tfrac {1}{2^{n}}}} for n = 1 , 2 , . . . {\displaystyle n=1,2,...} , the sum of probabilities would be 1 / 2 + 1 / 4 + 1 / 8 + ⋯ = 1 {\displaystyle 1/2+1/4+1/8+\dots =1} . Well-known discrete probability distributions used in statistical modeling include
18144-494: The real numbers. A discrete probability distribution is often represented with Dirac measures , the probability distributions of deterministic random variables . For any outcome ω {\displaystyle \omega } , let δ ω {\displaystyle \delta _{\omega }} be the Dirac measure concentrated at ω {\displaystyle \omega } . Given
18288-523: The relative occurrence of many different random values. Probability distributions can be defined in different ways and for discrete or for continuous variables. Distributions with special properties or for especially important applications are given specific names. A probability distribution is a mathematical description of the probabilities of events, subsets of the sample space . The sample space, often represented in notation by Ω , {\displaystyle \ \Omega \ ,}
18432-443: The same probability distribution one has, the more accurate the test will be, and the higher the precision with which one will be able to determine the mean value and show that it is not equal to zero; but this will also increase the importance of evaluating the real-world or scientific relevance of this deviation. The p -value is the probability under the null hypothesis of obtaining a real-valued test statistic at least as extreme as
18576-568: The same value: If a distribution does not have a finite expected value, as is the case for the Cauchy distribution , then the variance cannot be finite either. However, some distributions may not have a finite variance, despite their expected value being finite. An example is a Pareto distribution whose index k {\displaystyle k} satisfies 1 < k ≤ 2. {\displaystyle 1<k\leq 2.} The general formula for variance decomposition or
18720-431: The sample space can be seen as a numeric set), it is common to distinguish between discrete and absolutely continuous random variables . In the discrete case, it is sufficient to specify a probability mass function p {\displaystyle \ p\ } assigning a probability to each possible outcome (e.g. when throwing a fair die , each of the six digits “1” to “6” , corresponding to
18864-440: The sampling distribution under the null hypothesis, and then computing its cumulative distribution function (CDF) is often a difficult problem. Today, this computation is done using statistical software, often via numeric methods (rather than exact formulae), but, in the early and mid 20th century, this was instead done via tables of values, and one interpolated or extrapolated p -values from these discrete values. Rather than using
19008-405: The specific value as well as when compared to some threshold. In general, it stresses that " p -values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data". Usually, T {\displaystyle T} is a test statistic . A test statistic is the output of a scalar function of all the observations. This statistic provides
19152-397: The standard normal distribution N ( 0 , 1 ) , {\displaystyle {\mathcal {N}}(0,1),} then the rejection of this null hypothesis could mean that (i) the mean of T {\displaystyle T} is not 0, or (ii) the variance of T {\displaystyle T} is not 1, or (iii) T {\displaystyle T}
19296-448: The temperature on a given day. In the absolutely continuous case, probabilities are described by a probability density function , and the probability distribution is by definition the integral of the probability density function. The normal distribution is a commonly encountered absolutely continuous probability distribution. More complex experiments, such as those involving stochastic processes defined in continuous time , may demand
19440-424: The time, if the coin were fair. Hence, the null hypothesis is not rejected at the 0.05 level. However, had one more head been obtained, the resulting p -value (two-tailed) would have been 0.0414 (4.14%), in which case the null hypothesis would be rejected at the 0.05 level. The difference between the two meanings of "extreme" appear when we consider a sequential hypothesis testing, or optional stopping, for
19584-512: The total variance is given by A similar formula is applied in analysis of variance , where the corresponding formula is here M S {\displaystyle {\mathit {MS}}} refers to the Mean of the Squares. In linear regression analysis the corresponding formula is This can also be derived from the additivity of variances, since the total (observed) score is the sum of
19728-410: The uniform variable U {\displaystyle U} : U ≤ F ( x ) = F i n v ( U ) ≤ x . {\displaystyle {U\leq F(x)}={F^{\mathit {inv}}(U)\leq x}.} Variance In probability theory and statistics , variance is the expected value of the squared deviation from the mean of
19872-409: The use of p -values (especially 0.05, 0.02, and 0.01) as cutoffs, instead of computing and reporting p -values themselves. The same type of tables were then compiled in ( Fisher & Yates 1938 ), which cemented the approach. As an illustration of the application of p -values to the design and interpretation of experiments, in his following book The Design of Experiments (1935), Fisher presented
20016-432: The use of more general probability measures . A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels, ordered labels or binary) is called univariate , while a distribution whose sample space is a vector space of dimension 2 or more is called multivariate . A univariate distribution gives the probabilities of a single random variable taking on various different values;
20160-463: The values which the random variable may take. Thus the cumulative distribution function has the form F ( x ) = P ( X ≤ x ) = ∑ ω ≤ x p ( ω ) . {\displaystyle F(x)=P(X\leq x)=\sum _{\omega \leq x}p(\omega ).} The points where the cdf jumps always form a countable set; this may be any countable set and thus may even be dense in
20304-450: The variance can be expanded as follows: In other words, the variance of X is equal to the mean of the square of X minus the square of the mean of X . This equation should not be used for computations using floating point arithmetic , because it suffers from catastrophic cancellation if the two components of the equation are similar in magnitude. For other numerically stable alternatives, see algorithms for calculating variance . If
20448-536: The variance in situations where the CDF, but not the density , can be conveniently expressed. The second moment of a random variable attains the minimum value when taken around the first moment (i.e., mean) of the random variable, i.e. a r g m i n m E ( ( X − m ) 2 ) = E ( X ) {\displaystyle \mathrm {argmin} _{m}\,\mathrm {E} \left(\left(X-m\right)^{2}\right)=\mathrm {E} (X)} . Conversely, if
20592-423: The variance of X is The general formula for the variance of the outcome, X , of an n -sided die is The following table lists the variance for some commonly used probability distributions. Variance is non-negative because the squares are positive or zero: The variance of a constant is zero. Conversely, if the variance of a random variable is 0, then it is almost surely a constant. That is, it always has
20736-439: The widespread use of random variables , which transform the sample space into a set of numbers (e.g., R {\displaystyle \mathbb {R} } , N {\displaystyle \mathbb {N} } ), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets), and all probability distributions discussed in this article are of this type. It
#263736