In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers . The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation , which are greatly influenced by outliers.
30-637: QN or qn may refer to: Q n , one of several robust measures of scale in statistics ATCvet code QN Nervous system , a section of the Anatomical Therapeutic Chemical Classification System for veterinary medicinal products QN connector , a type of coaxial RF connector Queen's Nurse (QN), an honorary title awarded by the Queen's Nursing Institute (QNI) to community nurses Queen regnant (Qn.), in
60-409: A scale parameter , and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminated by a single point),
90-423: A confidence interval. The 200 extra weighings served only to detect and correct for operator error and did nothing to improve the confidence interval. With more repetitions, one could use a truncated mean , discarding the largest and smallest values and averaging the rest. A bootstrap calculation could be used to determine a confidence interval narrower than that calculated from σ, and so obtain some benefit from
120-414: A defect that is not shared by robust statistics. One of the most common robust measures of scale is the interquartile range (IQR), the difference between the 75th percentile and the 25th percentile of a sample; this is the 25% trimmed range , an example of an L-estimator . Other trimmed ranges, such as the interdecile range (10% trimmed range) can also be used. For a Gaussian distribution, IQR
150-499: A large amount of extra work. These procedures are robust against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation σ. In practical applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken for granted. Before trusting the results of 100 objects weighed just three times each to have confidence intervals calculated from σ, it
180-540: A normal distribution with standard deviation σ to simulate the situation; this can be done in Microsoft Excel using =NORMINV(RAND(),0,σ)) , as discussed in and the same techniques can be used in other spreadsheet programs such as in OpenOffice.org Calc and gnumeric . After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of
210-411: A normal distribution, S n is approximately unbiased for the population standard deviation even down to very modest sample sizes (<1% bias for n = 10). For a large sample from a normal distribution, 2.22 Q n is approximately unbiased for the population standard deviation. For small or moderate samples, the expected value of Q n under a normal distribution depends markedly on
240-438: A time, and repeated the whole process ten times. Then the operator can calculate a sample standard deviation for each object, and look for outliers . Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If the operator repeated the process only three times, simply taking the median of the three measurements and using σ would give
270-400: Is 1, whereas the population variance does not exist. These robust estimators typically have inferior statistical efficiency compared to conventional estimators for data drawn from a distribution without outliers (such as a normal distribution), but have superior efficiency for data drawn from a mixture distribution or from a heavy-tailed distribution , for which non-robust measures such as
300-563: Is a constant scale factor , which depends on the distribution. For normally distributed data k {\displaystyle k} is taken to be i.e., the reciprocal of the quantile function Φ − 1 {\displaystyle \Phi ^{-1}} (also known as the inverse of the cumulative distribution function ) for the standard normal distribution Z = ( X − μ ) / σ {\displaystyle Z=(X-\mu )/\sigma } . The argument 3/4
330-450: Is a constant depending on n {\displaystyle n} . These can be computed in O ( n log n ) time and O ( n ) space. Neither of these requires location estimation, as they are based only on differences between values. They are both more efficient than the MAD under a Gaussian distribution: S n is 58% efficient, while Q n is 82% efficient. For a sample from
SECTION 10
#1732787916076360-406: Is a measure of statistical dispersion . Moreover, the MAD is a robust statistic , being more resilient to outliers in a data set than the standard deviation . In the standard deviation, the distances from the mean are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant. Because
390-415: Is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a normal distribution with standard deviation σ. The theoretical analysis of such an experiment is complicated, but it is easy to set up a spreadsheet which draws random numbers from
420-545: Is related to σ {\displaystyle \sigma } as: Another familiar robust measure of scale is the median absolute deviation (MAD), the median of the absolute values of the differences between the data values and the overall median of the data set; for a Gaussian distribution, MAD is related to σ {\displaystyle \sigma } as: See Median absolute deviation#Relation to standard deviation for details. Robust measures of scale can be used as estimators of properties of
450-500: Is such that ± MAD {\displaystyle \pm \operatorname {MAD} } covers 50% (between 1/4 and 3/4) of the standard normal cumulative distribution function , i.e. Therefore, we must have that Noticing that we have that MAD / σ = Φ − 1 ( 3 / 4 ) = 0.67449 {\displaystyle \operatorname {MAD} /\sigma =\Phi ^{-1}(3/4)=0.67449} , from which we obtain
480-582: The median absolute deviation ( MAD ) is a robust measure of the variability of a univariate sample of quantitative data . It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a univariate data set X 1 , X 2 , ..., X n , the MAD is defined as the median of the absolute deviations from the data's median X ~ = median ( X ) {\displaystyle {\tilde {X}}=\operatorname {median} (X)} : that is, starting with
510-429: The residuals (deviations) from the data's median, the MAD is the median of their absolute values . Consider the data (1, 1, 2, 2 , 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1 , 2, 4, 7)). So the median absolute deviation for this data is 1. The median absolute deviation
540-549: The 200 resulting numbers. It should be normal with mean near zero and standard deviation a little larger than σ. A simple Monte Carlo spreadsheet calculation would reveal typical values for the standard deviation (around 105 to 115% of σ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller (around 75 to 85% of σ). Median absolute deviation#Relation to standard deviation In statistics ,
570-493: The Christian Church, following the name of a Christian saint who was a Queen Queer Nation (QN), a United States LGBT social movement Quintillion (qn), a large number Quotidiano Nazionale , an Italian online newspaper Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title QN . If an internal link led you here, you may wish to change
600-547: The MAD is a more robust estimator of scale than the sample variance or standard deviation , it works better with distributions without a mean or variance, such as the Cauchy distribution . The MAD may be used similarly to how one would use the deviation for the average. In order to use the MAD as a consistent estimator for the estimation of the standard deviation σ {\displaystyle \sigma } , one takes where k {\displaystyle k}
630-411: The confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set. In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of systematic error ). Suppose there were 100 objects and the operator weighed them all, one at
SECTION 20
#1732787916076660-409: The identical result as the univariate MAD in one dimension and generalizes to any number of dimensions. MADGM needs the geometric median to be found, which is done by an iterative process. The population MAD is defined analogously to the sample MAD, but is based on the complete population rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75th percentile of
690-433: The link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=QN&oldid=1257133204 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Robust measures of scale These robust statistics are particularly used as estimators of
720-461: The median increases, with points more than 9 MAD units from the median having no influence at all. Mizera & Müller (2004) propose a robust depth-based estimator for location and scale simultaneously. They propose a new measure named the Student median. A robust confidence interval is a robust modification of confidence intervals , meaning that one modifies the non-robust calculations of
750-422: The population standard deviation if the data follow a normal distribution . In other situations, it makes more sense to think of a robust measure of scale as an estimator of its own expected value , interpreted as an alternative to the population standard deviation as a measure of scale. For example, the MAD of a sample from a standard Cauchy distribution is an estimator of the population MAD, which in this case
780-502: The population, either for parameter estimation or as estimators of their own expected value . For example, robust estimators of scale are used to estimate the population standard deviation , generally by multiplying by a scale factor to make it an unbiased consistent estimator ; see scale parameter: estimation . For example, dividing the IQR by 2 √ 2 erf (1/2) (approximately 1.349), makes it an unbiased, consistent estimator for
810-442: The relation of MAD to the standard deviation is unchanged for normally distributed data. Analogously to how the median generalizes to the geometric median (GM) in multivariate data , MAD can be generalized to the median of distances to GM (MADGM) in n dimensions. This is done by replacing the absolute differences in one dimension by Euclidean distances of the data points to the geometric median in n dimensions. This gives
840-468: The sample size, so finite-sample correction factors (obtained from a table or from simulations) are used to calibrate the scale of Q n . Like S n and Q n , the biweight midvariance aims to be robust without sacrificing too much efficiency. It is defined as where I is the indicator function , Q is the sample median of the X i , and Its square root is a robust estimator of scale, since data points are downweighted as their distance from
870-400: The scale factor k = 1 / Φ − 1 ( 3 / 4 ) = 1.4826 {\displaystyle k=1/\Phi ^{-1}(3/4)=1.4826} . Another way of establishing the relationship is noting that MAD equals the half-normal distribution median: This form is used in, e.g., the probable error . In the case of complex values ( X +i Y ),
900-585: The standard deviation should not be used. For example, for data drawn from the normal distribution, the MAD is 37% as efficient as the sample standard deviation, while the Rousseeuw–Croux estimator Q n is 88% as efficient as the sample standard deviation. Rousseeuw and Croux propose alternatives to the MAD, motivated by two weaknesses of it: They propose two alternative statistics based on pairwise differences: S n and Q n , defined as: where c n {\displaystyle c_{n}}
#75924