Kolmogorov–Smirnov test ( K–S test or KS test ) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2 ), one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution (one-sample K–S test), or to test whether two samples came from the same distribution (two-sample K–S test). Intuitively, the test provides a method to qualitatively answer the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov .
102-399: The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in
204-808: A , b ∈ C {\displaystyle a,b\in \mathbb {C} } . If a = 0 {\displaystyle a=0} , then b = τ {\displaystyle b=\tau } and f ( z ) = e 2 π i z {\displaystyle f(z)=e^{2\pi iz}} . If a = − 2 π i {\displaystyle a=-2\pi i} , then f ( z ) = C ϑ ( z + 1 2 τ + b , τ ) {\displaystyle f(z)=C\vartheta (z+{\frac {1}{2}}\tau +b,\tau )} for some nonzero C ∈ C {\displaystyle C\in \mathbb {C} } . The Jacobi theta function defined above
306-427: A function called a metric or distance function . Metric spaces are the most general setting for studying many of the concepts of mathematical analysis and geometry . The most familiar example of a metric space is 3-dimensional Euclidean space with its usual notion of distance. Other well-known examples are a sphere equipped with the angular distance and the hyperbolic plane . A metric may correspond to
408-514: A goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters ). Various studies have found that, even in this corrected form,
510-456: A metaphorical , rather than physical, notion of distance: for example, the set of 100-character Unicode strings can be equipped with the Hamming distance , which measures the number of characters that need to be changed to get from one string to another. Since they are very general, metric spaces are a tool used in many different branches of mathematics. Many types of mathematical objects have
612-451: A metric space is an ordered pair ( M , d ) where M is a set and d is a metric on M , i.e., a function d : M × M → R {\displaystyle d\,\colon M\times M\to \mathbb {R} } satisfying the following axioms for all points x , y , z ∈ M {\displaystyle x,y,z\in M} : If
714-751: A "structure-preserving" map is one that fully preserves the distance function: It follows from the metric space axioms that a distance-preserving function is injective. A bijective distance-preserving function is called an isometry . One perhaps non-obvious example of an isometry between spaces described in this article is the map f : ( R 2 , d 1 ) → ( R 2 , d ∞ ) {\displaystyle f:(\mathbb {R} ^{2},d_{1})\to (\mathbb {R} ^{2},d_{\infty })} defined by f ( x , y ) = ( x + y , x − y ) . {\displaystyle f(x,y)=(x+y,x-y).} If there
816-399: A broader and more flexible way. This was important for the growing field of functional analysis. Mathematicians like Hausdorff and Stefan Banach further refined and expanded the framework of metric spaces. Hausdorff introduced topological spaces as a generalization of metric spaces. Banach's work in functional analysis heavily relied on the metric structure. Over time, metric spaces became
918-399: A central part of modern mathematics . They have influenced various fields including topology , geometry , and applied mathematics . Metric spaces continue to play a crucial role in the study of abstract mathematical concepts. A distance function is enough to define notions of closeness and convergence that were first developed in real analysis . Properties that depend on the structure of
1020-493: A characterization of metrizability in terms of other topological properties, without reference to metrics. Convergence of sequences in Euclidean space is defined as follows: Convergence of sequences in a topological space is defined as follows: In metric spaces, both of these definitions make sense and they are equivalent. This is a general pattern for topological properties of metric spaces: while they can be defined in
1122-852: A complex torus , a condition of descent . One interpretation of theta functions when dealing with the heat equation is that "a theta function is a special function that describes the evolution of temperature on a segment domain subject to certain boundary conditions". Throughout this article, ( e π i τ ) α {\displaystyle (e^{\pi i\tau })^{\alpha }} should be interpreted as e α π i τ {\displaystyle e^{\alpha \pi i\tau }} (in order to resolve issues of choice of branch ). There are several closely related functions called Jacobi theta functions, and many different and incompatible systems of notation for them. One Jacobi theta function (named after Carl Gustav Jacob Jacobi )
SECTION 10
#17327807235951224-451: A critical value of the test statistic D α such that P( D n > D α ) = α , then a band of width ± D α around F n ( x ) will entirely contain F ( x ) with probability 1 − α . A distribution-free multivariate Kolmogorov–Smirnov goodness of fit test has been proposed by Justel , Peña and Zamar (1997). The test uses a statistic which is built using Rosenblatt's transformation, and an algorithm
1326-559: A fit with minimum KS. In this case we should reject H 0 , which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject H 0 . In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power. Under
1428-404: A large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df = 2 via KS test whether the data could be normal or not, then a ML estimate based on H 0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than
1530-400: A metric space are referred to as metric properties . Every metric space is also a topological space , and some metric properties can also be rephrased without reference to distance in the language of topology; that is, they are really topological properties . For any point x in a metric space M and any real number r > 0 , the open ball of radius r around x is defined to be
1632-434: A metric space by measuring distances the same way we would in M . Formally, the induced metric on A is a function d A : A × A → R {\displaystyle d_{A}:A\times A\to \mathbb {R} } defined by d A ( x , y ) = d ( x , y ) . {\displaystyle d_{A}(x,y)=d(x,y).} For example, if we take
1734-404: A natural notion of distance and therefore admit the structure of a metric space, including Riemannian manifolds , normed vector spaces , and graphs . In abstract algebra , the p -adic numbers arise as elements of the completion of a metric structure on the rational numbers . Metric spaces are also studied in their own right in metric geometry and analysis on metric spaces . Many of
1836-485: A product formula for the theta function in the form In terms of w and q : where ( ; ) ∞ is the q -Pochhammer symbol and θ ( ; ) is the q -theta function . Expanding terms out, the Jacobi triple product can also be written which we may also write as This form is valid in general but clearly is of particular interest when z is real. Similar product formulas for
1938-400: A purely topological way, there is often a way that uses the metric which is easier to state or more familiar from real analysis. Informally, a metric space is complete if it has no "missing points": every sequence that looks like it should converge to something actually converges. To make this precise: a sequence ( x n ) in a metric space M is Cauchy if for every ε > 0 there
2040-802: A relevant reference at Euler function . The Ramanujan results quoted at Euler function plus a few elementary operations give the results below, so they are either in Ramanujan's lost notebook or follow immediately from it. See also Yi (2004). Define, with the nome q = e π i τ , {\displaystyle q=e^{\pi i\tau },} τ = n − 1 , {\displaystyle \tau =n{\sqrt {-1}},} and Dedekind eta function η ( τ ) . {\displaystyle \eta (\tau ).} Then for n = 1 , 2 , 3 , … {\displaystyle n=1,2,3,\dots } If
2142-466: A special case of this for the normal distribution. The logarithm transformation may help to overcome cases where the Kolmogorov test data does not seem to fit the assumption that it came from the normal distribution. Using estimated parameters, the question arises which estimation method should be used. Usually this would be the maximum likelihood method , but e.g. for the normal distribution MLE has
SECTION 20
#17327807235952244-699: A totally unacceptable 7 % {\displaystyle 7~\%} when n = 10 {\displaystyle n=10} . However, a very simple expedient of replacing x {\displaystyle x} by in the argument of the Jacobi theta function reduces these errors to 0.003 % {\displaystyle 0.003~\%} , 0.027 % {\displaystyle 0.027\%} , and 0.27 % {\displaystyle 0.27~\%} respectively; such accuracy would be usually considered more than adequate for all practical applications. The goodness-of-fit test or
2346-412: A way of measuring distances between them. Taking the completion of this metric space gives a new set of functions which may be less nice, but nevertheless useful because they behave similarly to the original nice functions in important ways. For example, weak solutions to differential equations typically live in a completion (a Sobolev space ) rather than the original space of nice functions for which
2448-487: Is uniformly continuous if for every real number ε > 0 there exists δ > 0 such that for all points x and y in M 1 such that d ( x , y ) < δ {\displaystyle d(x,y)<\delta } , we have d 2 ( f ( x ) , f ( y ) ) < ε . {\displaystyle d_{2}(f(x),f(y))<\varepsilon .} The only difference between this definition and
2550-549: Is K - Lipschitz if d 2 ( f ( x ) , f ( y ) ) ≤ K d 1 ( x , y ) for all x , y ∈ M 1 . {\displaystyle d_{2}(f(x),f(y))\leq Kd_{1}(x,y)\quad {\text{for all}}\quad x,y\in M_{1}.} Lipschitz maps are particularly important in metric geometry, since they provide more flexibility than distance-preserving maps, but still make essential use of
2652-775: Is Lebesgue's number lemma , which shows that for any open cover of a compact space, every point is relatively deep inside one of the sets of the cover. Unlike in the case of topological spaces or algebraic structures such as groups or rings , there is no single "right" type of structure-preserving function between metric spaces. Instead, one works with different types of functions depending on one's goals. Throughout this section, suppose that ( M 1 , d 1 ) {\displaystyle (M_{1},d_{1})} and ( M 2 , d 2 ) {\displaystyle (M_{2},d_{2})} are two metric spaces. The words "function" and "map" are used interchangeably. One interpretation of
2754-410: Is not a topological property, since R {\displaystyle \mathbb {R} } is complete but the homeomorphic space (0, 1) is not. This notion of "missing points" can be made precise. In fact, every metric space has a unique completion , which is a complete space that contains the given space as a dense subset. For example, [0, 1] is the completion of (0, 1) , and
2856-492: Is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the univariate Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions. Some argue that the Cucconi test , originally proposed for simultaneously comparing location and scale, can be much more powerful than
2958-412: Is 1. Fast and accurate algorithms to compute the cdf Pr ( D n ≤ x ) {\displaystyle \operatorname {Pr} (D_{n}\leq x)} or its complement for arbitrary n {\displaystyle n} and x {\displaystyle x} , are available from: If either the form or the parameters of F ( x ) are determined from
3060-415: Is a neighborhood of x (informally, it contains all points "close enough" to x ) if it contains an open ball of radius r around x for some r > 0 . An open set is a set which is a neighborhood of all its points. It follows that the open balls form a base for a topology on M . In other words, the open sets of M are exactly the unions of open balls. As in any topology, closed sets are
3162-515: Is a Fourier series for a 1-periodic entire function of z . Accordingly, the theta function is 1-periodic in z : By completing the square , it is also τ -quasiperiodic in z , with Thus, in general, for any integers a and b . For any fixed τ {\displaystyle \tau } , the function is an entire function on the complex plane, so by Liouville's theorem , it cannot be doubly periodic in 1 , τ {\displaystyle 1,\tau } unless it
Kolmogorov–Smirnov test - Misplaced Pages Continue
3264-559: Is a continuous bijection whose inverse is also continuous; if there is a homeomorphism between M 1 and M 2 , they are said to be homeomorphic . Homeomorphic spaces are the same from the point of view of topology, but may have very different metric properties. For example, R {\displaystyle \mathbb {R} } is unbounded and complete, while (0, 1) is bounded but not complete. A function f : M 1 → M 2 {\displaystyle f\,\colon M_{1}\to M_{2}}
3366-419: Is a function defined for two complex variables z and τ , where z can be any complex number and τ is the half-period ratio , confined to the upper half-plane , which means it has a positive imaginary part. It is given by the formula where q = exp( πiτ ) is the nome and η = exp(2 πiz ) . It is a Jacobi form . The restriction ensures that it is an absolutely convergent series. At fixed τ , this
3468-482: Is an integer N such that for all m , n > N , d ( x m , x n ) < ε . By the triangle inequality, any convergent sequence is Cauchy: if x m and x n are both less than ε away from the limit, then they are less than 2ε away from each other. If the converse is true—every Cauchy sequence in M converges—then M is complete. Euclidean spaces are complete, as is R 2 {\displaystyle \mathbb {R} ^{2}} with
3570-435: Is an isometry between the spaces M 1 and M 2 , they are said to be isometric . Metric spaces that are isometric are essentially identical . On the other end of the spectrum, one can forget entirely about the metric structure and study continuous maps , which only preserve topological structure. There are several equivalent definitions of continuity for metric spaces. The most important are: A homeomorphism
3672-496: Is bounded. To see this, start with a finite cover by r -balls for some arbitrary r . Since the subset of M consisting of the centers of these balls is finite, it has finite diameter, say D . By the triangle inequality, the diameter of the whole space is at most D + 2 r . The converse does not hold: an example of a metric space that is bounded but not totally bounded is R 2 {\displaystyle \mathbb {R} ^{2}} (or any other infinite set) with
3774-797: Is constant, and so the best we could do is to make it periodic in 1 {\displaystyle 1} and quasi-periodic in τ {\displaystyle \tau } . Indeed, since | ϑ ( z + a + b τ ; τ ) ϑ ( z ; τ ) | = exp ( π ( b 2 ℑ ( τ ) + 2 b ℑ ( z ) ) ) {\displaystyle \left|{\frac {\vartheta (z+a+b\tau ;\tau )}{\vartheta (z;\tau )}}\right|=\exp \left(\pi (b^{2}\Im (\tau )+2b\Im (z))\right)} and ℑ ( τ ) > 0 {\displaystyle \Im (\tau )>0} ,
3876-493: Is defined as The Kolmogorov–Smirnov statistic for a given cumulative distribution function F ( x ) is where sup x is the supremum of the set of distances. Intuitively, the statistic takes the largest absolute difference between the two distribution functions across all x values. By the Glivenko–Cantelli theorem , if the sample comes from distribution F ( x ), then D n converges to 0 almost surely in
3978-418: Is defined by d 1 ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = | x 2 − x 1 | + | y 2 − y 1 | {\displaystyle d_{1}((x_{1},y_{1}),(x_{2},y_{2}))=|x_{2}-x_{1}|+|y_{2}-y_{1}|} and can be thought of as
4080-429: Is developed to compute it in the bivariate case. An approximate test that can be easily computed in any dimension is also presented. The Kolmogorov–Smirnov test statistic needs to be modified if a similar test is to be applied to multivariate data . This is not straightforward because the maximum difference between two joint cumulative distribution functions is not generally the same as the maximum difference of any of
4182-577: Is due to Peacock (see also Gosset for a 3D version) and another to Fasano and Franceschini (see Lopes et al. for a comparison and computational details). Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution. In one dimension, the Kolmogorov–Smirnov statistic is identical to the so-called star discrepancy D, so another native KS extension to higher dimensions would be simply to use D also for higher dimensions. Unfortunately,
Kolmogorov–Smirnov test - Misplaced Pages Continue
4284-401: Is entire and nonconstant, and satisfies the functional equations { f ( z + 1 ) = f ( z ) f ( z + τ ) = e a z + 2 π i b f ( z ) {\displaystyle {\begin{cases}f(z+1)=f(z)\\f(z+\tau )=e^{az+2\pi ib}f(z)\end{cases}}} for some constant
4386-400: Is finite is not very impressive: even when n = 1000 {\displaystyle n=1000} , the corresponding maximum error is about 0.9 % {\displaystyle 0.9~\%} ; this error increases to 2.6 % {\displaystyle 2.6~\%} when n = 100 {\displaystyle n=100} and to
4488-532: Is purely discrete or mixed, implemented in C++ and in the KSgeneral package of the R language . The functions disc_ks_test() , mixed_ks_test() and cont_ks_test() compute also the KS test statistic and p-values for purely discrete, mixed or continuous null distributions and arbitrary sample sizes. The KS test and its p-values for discrete null distributions and small sample sizes are also computed in as part of
4590-487: Is rejected at level α {\displaystyle \alpha } if Where n {\displaystyle n} and m {\displaystyle m} are the sizes of first and second sample respectively. The value of c ( α ) {\displaystyle c({\alpha })} is given in the table below for the most common levels of α {\displaystyle \alpha } and in general by so that
4692-550: Is sometimes considered along with three auxiliary theta functions, in which case it is written with a double 0 subscript: The auxiliary (or half-period) functions are defined by This notation follows Riemann and Mumford ; Jacobi 's original formulation was in terms of the nome q = e rather than τ . In Jacobi's notation the θ -functions are written: The above definitions of the Jacobi theta functions are by no means unique. See Jacobi theta functions (notational variations) for further discussion. If we set z = 0 in
4794-403: Is that occurring in the theory of elliptic functions . With respect to one of the complex variables (conventionally called z ), a theta function has a property expressing its behavior with respect to the addition of a period of the associated elliptic functions, making it a quasiperiodic function . In the abstract theory this quasiperiodicity comes from the cohomology class of a line bundle on
4896-464: Is the Rogers–Ramanujan continued fraction : The mathematician Bruce Berndt found out further values of the theta function: Many values of the theta function and especially of the shown phi function can be represented in terms of the gamma function: For the transformation of the nome in the theta functions these formulas can be used: The squares of the three theta zero-value functions with
4998-506: Is the Brownian bridge. If F is continuous then under the null hypothesis n D n {\displaystyle {\sqrt {n}}D_{n}} converges to the Kolmogorov distribution, which does not depend on F . This result may also be known as the Kolmogorov theorem. The accuracy of this limit as an approximation to the exact cdf of K {\displaystyle K} when n {\displaystyle n}
5100-430: Is used. One might require that the result of the test used should not depend on which choice is made. One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting KS statistics. In d dimensions, there are 2 − 1 such orderings. One such variation
5202-538: The Heine–Cantor theorem states that if M 1 is compact, then every continuous map is uniformly continuous. In other words, uniform continuity cannot distinguish any non-topological features of compact metric spaces. A Lipschitz map is one that stretches distances by at most a bounded factor. Formally, given a real number K > 0 , the map f : M 1 → M 2 {\displaystyle f\,\colon M_{1}\to M_{2}}
SECTION 50
#17327807235955304-482: The Lemniscate constant is represented. Note that the following modular identities hold: where s ( q ) = s ( e π i τ ) = − R ( − e − π i / ( 5 τ ) ) {\displaystyle s(q)=s\left(e^{\pi i\tau }\right)=-R\left(-e^{-\pi i/(5\tau )}\right)}
5406-578: The Macdonald identities ) tells us that for complex numbers w and q with | q | < 1 and w ≠ 0 we have It can be proven by elementary means, as for instance in Hardy and Wright's An Introduction to the Theory of Numbers . If we express the theta function in terms of the nome q = e (noting some authors instead set q = e ) and take w = e then We therefore obtain
5508-481: The modular group , which is generated by τ ↦ τ + 1 and τ ↦ − 1 / τ . Equations for the first transform are easily found since adding one to τ in the exponent has the same effect as adding 1 / 2 to z ( n ≡ n mod 2 ). For the second, let Then Instead of expressing the Theta functions in terms of z and τ , we may express them in terms of arguments w and
5610-516: The nome q = e . Observe that θ 1 ( q ) = 0 {\displaystyle \theta _{1}(q)=0} . These can be used to define a variety of modular forms , and to parametrize certain curves; in particular, the Jacobi identity is or equivalently, which is the Fermat curve of degree four. Jacobi's identities describe how theta functions transform under
5712-462: The nome q , where w = e and q = e . In this form, the functions become We see that the theta functions can also be defined in terms of w and q , without a direct reference to the exponential function. These formulas can, therefore, be used to define the Theta functions over other fields where the exponential function might not be everywhere defined, such as fields of p -adic numbers . The Jacobi triple product (a special case of
5814-468: The surface of the Earth as a set of points. We can measure the distance between two such points by the length of the shortest path along the surface , " as the crow flies "; this is particularly useful for shipping and aviation. We can also measure the straight-line distance between two points through the Earth's interior; this notion is, for example, natural in seismology , since it roughly corresponds to
5916-454: The two sample test can also be performed under more general conditions that allow for discontinuity, heterogeneity and dependence across samples. The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. The Kolmogorov–Smirnov test can be modified to serve as
6018-486: The Euclidean metric and its subspace the interval (0, 1) with the induced metric are homeomorphic but have very different metric properties. Conversely, not every topological space can be given a metric. Topological spaces which are compatible with a metric are called metrizable and are particularly well-behaved in many ways: in particular, they are paracompact Hausdorff spaces (hence normal ) and first-countable . The Nagata–Smirnov metrization theorem gives
6120-414: The Kolmogorov–Smirnov statistic is where F 1 , n {\displaystyle F_{1,n}} and F 2 , m {\displaystyle F_{2,m}} are the empirical distribution functions of the first and the second sample respectively, and sup {\displaystyle \sup } is the supremum function . For large samples, the null hypothesis
6222-409: The Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when n → ∞ . {\displaystyle n\to \infty .} It rejects the null hypothesis at level α {\displaystyle \alpha } if where K α is found from The asymptotic power of this test
SECTION 60
#17327807235956324-425: The Kolmogorov–Smirnov test when comparing two distribution functions. Two-sample KS tests have been applied in economics to detect asymmetric effects and to study natural experiments. While the Kolmogorov–Smirnov test is usually used to test whether a given F ( x ) is the underlying probability distribution of F n ( x ), the procedure may be inverted to give confidence limits on F ( x ) itself. If one chooses
6426-542: The above theta functions, we obtain four functions of τ only, defined on the upper half-plane. These functions are called Theta Nullwert functions, based on the German term for zero value because of the annullation of the left entry in the theta function expression. Alternatively, we obtain four functions of q only, defined on the unit disk | q | < 1 {\displaystyle |q|<1} . They are sometimes called theta constants : with
6528-619: The assumption that F ( x ) {\displaystyle F(x)} is non-decreasing and right-continuous, with countable (possibly infinite) number of jumps, the KS test statistic can be expressed as: From the right-continuity of F ( x ) {\displaystyle F(x)} , it follows that F ( F − 1 ( t ) ) ≥ t {\displaystyle F(F^{-1}(t))\geq t} and F − 1 ( F ( x ) ) ≤ x {\displaystyle F^{-1}(F(x))\leq x} and hence,
6630-751: The auxiliary theta functions are In particular, lim q → 0 ϑ 10 ( z ∣ q ) 2 q 1 4 = cos ( π z ) , lim q → 0 − ϑ 11 ( z ∣ q ) 2 q − 1 4 = sin ( π z ) {\displaystyle \lim _{q\to 0}{\frac {\vartheta _{10}(z\mid q)}{2q^{\frac {1}{4}}}}=\cos(\pi z),\quad \lim _{q\to 0}{\frac {-\vartheta _{11}(z\mid q)}{2q^{-{\frac {1}{4}}}}}=\sin(\pi z)} so we may interpret them as one-parameter deformations of
6732-436: The basic notions of mathematical analysis , including balls , completeness , as well as uniform , Lipschitz , and Hölder continuity , can be defined in the setting of metric spaces. Other notions, such as continuity , compactness , and open and closed sets , can be defined for metric spaces, but also in the even more general setting of topological spaces . To see the utility of different notions of distance, consider
6834-408: The complementary distribution functions. Thus the maximum difference will differ depending on which of Pr ( X < x ∧ Y < y ) {\displaystyle \Pr(X<x\land Y<y)} or Pr ( X < x ∧ Y > y ) {\displaystyle \Pr(X<x\land Y>y)} or any of the other two possible arrangements
6936-495: The complements of open sets. Sets may be both open and closed as well as neither open nor closed. This topology does not carry all the information about the metric space. For example, the distances d 1 , d 2 , and d ∞ defined above all induce the same topology on R 2 {\displaystyle \mathbb {R} ^{2}} , although they behave differently in many respects. Similarly, R {\displaystyle \mathbb {R} } with
7038-455: The condition reads Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. m = n {\displaystyle m=n} ), the minimal bound scales in the size of either of the samples according to its inverse square root. Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution
7140-508: The data X i the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required, but tables have been prepared for some cases. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published, and later publications also include the Gumbel distribution . The Lilliefors test represents
7242-478: The dgof package of the R language. Major statistical packages among which SAS PROC NPAR1WAY , Stata ksmirnov implement the KS test under the assumption that F ( x ) {\displaystyle F(x)} is continuous, which is more conservative if the null distribution is actually not continuous (see ). The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case,
7344-407: The differential equation actually makes sense. A metric space M is bounded if there is an r such that no pair of points in M is more than distance r apart. The least such r is called the diameter of M . The space M is called precompact or totally bounded if for every r > 0 there is a finite cover of M by open balls of radius r . Every totally bounded space
7446-479: The discrete metric no longer remembers that the set is a plane, but treats it just as an undifferentiated set of points. All of these metrics make sense on R n {\displaystyle \mathbb {R} ^{n}} as well as R 2 {\displaystyle \mathbb {R} ^{2}} . Given a metric space ( M , d ) and a subset A ⊆ M {\displaystyle A\subseteq M} , we can consider A to be
7548-414: The discrete metric. Compactness is a topological property which generalizes the properties of a closed and bounded subset of Euclidean space. There are several equivalent definitions of compactness in metric spaces: One example of a compact space is the closed interval [0, 1] . Compactness is important for similar reasons to completeness: it makes it easy to find limits. Another important tool
7650-1066: The distance function d ( x , y ) = | y − x | {\displaystyle d(x,y)=|y-x|} given by the absolute difference form a metric space. Many properties of metric spaces and functions between them are generalizations of concepts in real analysis and coincide with those concepts when applied to the real line. The Euclidean plane R 2 {\displaystyle \mathbb {R} ^{2}} can be equipped with many different metrics. The Euclidean distance familiar from school mathematics can be defined by d 2 ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = ( x 2 − x 1 ) 2 + ( y 2 − y 1 ) 2 . {\displaystyle d_{2}((x_{1},y_{1}),(x_{2},y_{2}))={\sqrt {(x_{2}-x_{1})^{2}+(y_{2}-y_{1})^{2}}}.} The taxicab or Manhattan distance
7752-774: The distance you need to travel along horizontal and vertical lines to get from one point to the other, as illustrated at the top of the article. The maximum , L ∞ {\displaystyle L^{\infty }} , or Chebyshev distance is defined by d ∞ ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = max { | x 2 − x 1 | , | y 2 − y 1 | } . {\displaystyle d_{\infty }((x_{1},y_{1}),(x_{2},y_{2}))=\max\{|x_{2}-x_{1}|,|y_{2}-y_{1}|\}.} This distance does not have an easy explanation in terms of paths in
7854-477: The distribution of D n {\displaystyle D_{n}} depends on the null distribution F ( x ) {\displaystyle F(x)} , i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of D n {\displaystyle D_{n}} when F ( x ) {\displaystyle F(x)}
7956-515: The field of non-euclidean geometry through the use of the Cayley-Klein metric . The idea of an abstract space with metric properties was addressed in 1906 by René Maurice Fréchet and the term metric space was coined by Felix Hausdorff in 1914. Fréchet's work laid the foundation for understanding convergence , continuity , and other key concepts in non-geometric spaces. This allowed mathematicians to study functions and sequences in
8058-440: The form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by Andrey Kolmogorov , while a table of the distribution was published by Nikolai Smirnov . Recurrence relations for the distribution of the test statistic in finite samples are available. Under null hypothesis that the sample comes from the hypothesized distribution F ( x ), in distribution , where B ( t )
8160-1168: The formula d ∞ ( p , q ) ≤ d 2 ( p , q ) ≤ d 1 ( p , q ) ≤ 2 d ∞ ( p , q ) , {\displaystyle d_{\infty }(p,q)\leq d_{2}(p,q)\leq d_{1}(p,q)\leq 2d_{\infty }(p,q),} which holds for every pair of points p , q ∈ R 2 {\displaystyle p,q\in \mathbb {R} ^{2}} . A radically different distance can be defined by setting d ( p , q ) = { 0 , if p = q , 1 , otherwise. {\displaystyle d(p,q)={\begin{cases}0,&{\text{if }}p=q,\\1,&{\text{otherwise.}}\end{cases}}} Using Iverson brackets , d ( p , q ) = [ p ≠ q ] {\displaystyle d(p,q)=[p\neq q]} In this discrete metric , all distinct points are 1 unit apart: none of them are close to each other, and none of them are very far away from each other either. Intuitively,
8262-403: The function ϑ ( z , τ ) {\displaystyle \vartheta (z,\tau )} is unbounded, as required by Liouville's theorem. It is in fact the most general entire function with 2 quasi-periods, in the following sense: Theorem — If f : C → C {\displaystyle f:\mathbb {C} \to \mathbb {C} }
8364-481: The length of time it takes for seismic waves to travel between those two points. The notion of distance encoded by the metric space axioms has relatively few requirements. This generality gives metric spaces a lot of flexibility. At the same time, the notion is strong enough to encode many intuitive facts about what distance means. This means that general results about metric spaces can be applied in many different contexts. Like many fundamental mathematical concepts,
8466-506: The limit when n {\displaystyle n} goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution ). Donsker's theorem provides a yet stronger result. In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject
8568-406: The limiting distribution does not depend on the marginal distributions. The Kolmogorov–Smirnov test is implemented in many software programs. Most of these implement both the one and two sampled test. Metric (mathematics) In mathematics , a metric space is a set together with a notion of distance between its elements , usually called points . The distance is measured by
8670-430: The metric d is unambiguous, one often refers by abuse of notation to "the metric space M ". By taking all axioms except the second, one can show that distance is always non-negative: 0 = d ( x , x ) ≤ d ( x , y ) + d ( y , x ) = 2 d ( x , y ) {\displaystyle 0=d(x,x)\leq d(x,y)+d(y,x)=2d(x,y)} Therefore
8772-531: The metric on a metric space can be interpreted in many different ways. A particular metric may not be best thought of as measuring physical distance, but, instead, as the cost of changing from one state to another (as with Wasserstein metrics on spaces of measures ) or the degree of difference between two objects (for example, the Hamming distance between two strings of characters, or the Gromov–Hausdorff distance between metric spaces themselves). Formally,
8874-485: The metric. For example, a curve in a metric space is rectifiable (has finite length) if and only if it has a Lipschitz reparametrization. Jacobi theta function In mathematics , theta functions are special functions of several complex variables . They show up in many topics, including Abelian varieties , moduli spaces , quadratic forms , and solitons . As Grassmann algebras , they appear in quantum field theory . The most common form of theta function
8976-522: The null hypothesis. The Kolmogorov distribution is the distribution of the random variable where B ( t ) is the Brownian bridge . The cumulative distribution function of K is given by which can also be expressed by the Jacobi theta function ϑ 01 ( z = 0 ; τ = 2 i x 2 / π ) {\displaystyle \vartheta _{01}(z=0;\tau =2ix^{2}/\pi )} . Both
9078-429: The one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2 ), purely discrete or mixed (see Section 2.2 ). In the two-sample case (see Section 3 ), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted. However,
9180-543: The other metrics described above. Two examples of spaces which are not complete are (0, 1) and the rationals, each with the metric induced from R {\displaystyle \mathbb {R} } . One can think of (0, 1) as "missing" its endpoints 0 and 1. The rationals are missing all the irrationals, since any irrational has a sequence of rationals converging to it in R {\displaystyle \mathbb {R} } (for example, its successive decimal approximations). These examples show that completeness
9282-437: The periodic functions sin , cos {\displaystyle \sin ,\cos } , again validating the interpretation of the theta function as the most general 2 quasi-period function. The Jacobi theta functions have the following integral representations: The Theta Nullwert function θ 3 ( q ) {\displaystyle \theta _{3}(q)} as this integral identity: This formula
9384-428: The plane, but it still satisfies the metric space axioms. It can be thought of similarly to the number of moves a king would have to make on a chess board to travel from one point to another on the given space. In fact, these three distances, while they have distinct properties, are similar in some ways. Informally, points that are close in one are close in the others, too. This observation can be quantified with
9486-472: The real line. Arthur Cayley , in his article "On Distance", extended metric concepts beyond Euclidean geometry into domains bounded by a conic in a projective space. His distance was given by logarithm of a cross ratio . Any projectivity leaving the conic stable also leaves the cross ratio constant, so isometries are implicit. This method provides models for elliptic geometry and hyperbolic geometry , and Felix Klein , in several publications, established
9588-407: The real numbers are the completion of the rationals. Since complete spaces are generally easier to work with, completions are important throughout mathematics. For example, in abstract algebra, the p -adic numbers are defined as the completion of the rationals under a different metric. Completion is particularly common as a tool in functional analysis . Often one has a set of nice functions and
9690-487: The reciprocal of the Gelfond constant is raised to the power of the reciprocal of an odd number, then the corresponding ϑ 00 {\displaystyle \vartheta _{00}} values or ϕ {\displaystyle \phi } values can be represented in a simplified way by using the hyperbolic lemniscatic sine : With the letter ϖ {\displaystyle \varpi }
9792-415: The second axiom can be weakened to If x ≠ y , then d ( x , y ) ≠ 0 {\textstyle {\text{If }}x\neq y{\text{, then }}d(x,y)\neq 0} and combined with the first to make d ( x , y ) = 0 ⟺ x = y {\textstyle d(x,y)=0\iff x=y} . The real numbers with
9894-482: The set of points that are strictly less than distance r from x : B r ( x ) = { y ∈ M : d ( x , y ) < r } . {\displaystyle B_{r}(x)=\{y\in M:d(x,y)<r\}.} This is a natural way to define a set of points that are relatively close to x . Therefore, a set N ⊆ M {\displaystyle N\subseteq M}
9996-426: The star discrepancy is hard to calculate in high dimensions. In 2021 the functional form of the multivariate KS test statistic was proposed, which simplified the problem of estimating the tail probabilities of the multivariate KS test statistic, which is needed for the statistical test. For the multivariate case, if F i is the i th continuous marginal from a probability distribution with k variables, then so
10098-510: The test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test . However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values. The empirical distribution function F n for n independent and identically distributed (i.i.d.) ordered observations X i
10200-410: The two-dimensional sphere S as a subset of R 3 {\displaystyle \mathbb {R} ^{3}} , the Euclidean metric on R 3 {\displaystyle \mathbb {R} ^{3}} induces the straight-line metric on S described above. Two more useful examples are the open interval (0, 1) and the closed interval [0, 1] thought of as subspaces of
10302-413: The ε–δ definition of continuity is the order of quantifiers: the choice of δ must depend only on ε and not on the point x . However, this subtle change makes a big difference. For example, uniformly continuous maps take Cauchy sequences in M 1 to Cauchy sequences in M 2 . In other words, uniform continuity preserves some metric properties which are not purely topological. On the other hand,
10404-668: Was discussed in the essay Square series generating function transformations by the mathematician Maxie Schmidt from Georgia in Atlanta. Based on this formula following three eminent examples are given: Furthermore, the theta examples θ 3 ( 1 2 ) {\displaystyle \theta _{3}({\tfrac {1}{2}})} and θ 3 ( 1 3 ) {\displaystyle \theta _{3}({\tfrac {1}{3}})} shall be displayed: Proper credit for most of these results goes to Ramanujan. See Ramanujan's lost notebook and
#594405