In statistics , a simple random sample (or SRS ) is a subset of individuals (a sample ) chosen from a larger set (a population ) in which a subset of individuals are chosen randomly , all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.
65-450: The principle of simple random sampling is that every set with the same number of items has the same probability of being chosen. For example, suppose N college students want to get a ticket for a basketball game, but there are only X < N tickets for them, so they decide to have a fair way to see who gets to go. Then, everybody is given a number in the range from 0 to N -1, and random numbers are generated, either electronically or from
130-405: A biased coin comes up heads with probability 0.3 when tossed. The probability of seeing exactly 4 heads in 6 tosses is The cumulative distribution function can be expressed as: where ⌊ k ⌋ {\displaystyle \lfloor k\rfloor } is the "floor" under k , i.e. the greatest integer less than or equal to k . It can also be represented in terms of
195-411: A (shorter) exchangeable sequence of 0s and 1s with probability 1/2. Partition the sequence into non-overlapping pairs: if the two elements of the pair are equal (00 or 11), discard it; if the two elements of the pair are unequal (01 or 10), keep the first. This yields a sequence of Bernoulli trials with p = 1 / 2 , {\displaystyle p=1/2,} as, by exchangeability,
260-407: A ) and Bernoulli( p ) distribution): Asymptotically, this bound is reasonably tight; see for details. One can also obtain lower bounds on the tail F ( k ; n , p ) , known as anti-concentration bounds. By approximating the binomial coefficient with Stirling's formula it can be shown that which implies the simpler but looser bound For p = 1/2 and k ≥ 3 n /8 for even n , it
325-581: A complete sampling frame , which may not be available or feasible to construct for large populations. Even if a complete frame is available, more efficient approaches may be possible if other useful information is available about the units in the population. Advantages are that it is free of classification error, and it requires minimum previous knowledge of the population other than the frame. Its simplicity also makes it relatively easy to interpret data collected in this manner. For these reasons, simple random sampling best suits situations where not much information
390-421: A particular sample is a perfect representation of the population. Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample. The concept can be extended when the population is a geographic area. In this case, area sampling frames are relevant. Conceptually, simple random sampling is the simplest of the probability sampling techniques. It requires
455-542: A property which is used in various ways, such as in Wald's confidence intervals . A closed form Bayes estimator for p also exists when using the Beta distribution as a conjugate prior distribution . When using a general Beta ( α , β ) {\displaystyle \operatorname {Beta} (\alpha ,\beta )} as a prior, the posterior mean estimator is: The Bayes estimator
520-544: A successful result, then the expected value of X is: This follows from the linearity of the expected value along with the fact that X is the sum of n identical Bernoulli random variables, each with expected value p . In other words, if X 1 , … , X n {\displaystyle X_{1},\ldots ,X_{n}} are identical (and independent) Bernoulli random variables with parameter p , then X = X 1 + ... + X n and The variance is: This similarly follows from
585-453: A table of random numbers. Numbers outside the range from 0 to N -1 are ignored, as are any numbers previously selected. The first X numbers would identify the lucky ticket winners. In small populations and often in large ones, such sampling is typically done " without replacement ", i.e., one deliberately avoids choosing any member of the population more than once. Although simple random sampling can be conducted with replacement instead, this
650-786: Is asymptotically efficient and as the sample size approaches infinity ( n → ∞ ), it approaches the MLE solution. The Bayes estimator is biased (how much depends on the priors), admissible and consistent in probability. Using the Bayesian estimator with the Beta distribution can be used with Thompson sampling . For the special case of using the standard uniform distribution as a non-informative prior , Beta ( α = 1 , β = 1 ) = U ( 0 , 1 ) {\displaystyle \operatorname {Beta} (\alpha =1,\beta =1)=U(0,1)} ,
715-424: Is a close approximation to the i.i.d. model.) An infinite exchangeable sequence is strictly stationary and so a law of large numbers in the form of Birkhoff–Khinchin theorem applies. This means that the underlying distribution can be given an operational interpretation as the limiting empirical distribution of the sequence of values. The close relationship between exchangeable sequences of random variables and
SECTION 10
#1732801165479780-448: Is a finite or infinite sequence X 1 , X 2 , X 3 , ... of random variables such that for any finite permutation σ of the indices 1, 2, 3, ..., (the permutation acts on only finitely many indices, with the rest fixed), the joint probability distribution of the permuted sequence is the same as the joint probability distribution of the original sequence. (A sequence E 1 , E 2 , E 3 , ... of events
845-442: Is a mode. In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However, several special results have been established: For k ≤ np , upper bounds can be derived for the lower tail of the cumulative distribution function F ( k ; n , p ) = Pr ( X ≤ k ) {\displaystyle F(k;n,p)=\Pr(X\leq k)} ,
910-431: Is a sequential algorithm and requires knowledge of total count of items n {\displaystyle n} , which is not available in streaming scenarios. A very simple random sort algorithm was proved by Sunter in 1977. The algorithm simply assigns a random number drawn from uniform distribution ( 0 , 1 ) {\displaystyle (0,1)} as a key to each item, then sorts all items using
975-470: Is also consistent both in probability and in MSE . This statistic is asymptotically normal thanks to the central limit theorem , because it is the same as taking the mean over Bernoulli samples. It has a variance of v a r ( p ^ ) = p ( 1 − p ) n {\displaystyle var({\widehat {p}})={\frac {p(1-p)}{n}}} ,
1040-423: Is also a fixed value which does not depend on the particular random variables in the sequence. There is a weaker lower bound than for infinite exchangeability and it is possible for negative correlation to exist. Covariance for exchangeable sequences (infinite): If the sequence X 1 , X 2 , X 3 , … {\displaystyle X_{1},X_{2},X_{3},\ldots }
1105-597: Is an integer, then ( n + 1 ) p − 1 {\displaystyle (n+1)p-1} and ( n + 1 ) p {\displaystyle (n+1)p} is a mode. In the case that ( n + 1 ) p − 1 ∉ Z {\displaystyle (n+1)p-1\notin \mathbb {Z} } , then only ⌊ ( n + 1 ) p − 1 ⌋ + 1 = ⌊ ( n + 1 ) p ⌋ {\displaystyle \lfloor (n+1)p-1\rfloor +1=\lfloor (n+1)p\rfloor }
1170-400: Is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity. If these conditions do not hold, stratified sampling or cluster sampling may be a better choice. A sampling method for which each individual unit has the same chance of being selected
1235-417: Is called equal probability sampling (epsem for short). Using a simple random sample will always lead to an epsem, but not all epsem samples are SRS. For example, if a teacher has a class arranged in 5 rows of 6 columns and she wants to take a random sample of 5 students she might pick one of the 6 columns at random. This would be an epsem sample but not all subsets of 5 pupils are equally likely here, as only
1300-433: Is called a Bernoulli process ; for a single trial, i.e., n = 1 , the binomial distribution is a Bernoulli distribution . The binomial distribution is the basis for the binomial test of statistical significance . The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N . If the sampling is carried out without replacement,
1365-675: Is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows: Proof: Let For p = 0 {\displaystyle p=0} only f ( 0 ) {\displaystyle f(0)} has a nonzero value with f ( 0 ) = 1 {\displaystyle f(0)=1} . For p = 1 {\displaystyle p=1} we find f ( n ) = 1 {\displaystyle f(n)=1} and f ( k ) = 0 {\displaystyle f(k)=0} for k ≠ n {\displaystyle k\neq n} . This proves that
SECTION 20
#17328011654791430-470: Is exchangeable, then Covariance for exchangeable sequences (finite): If X 1 , X 2 , … , X n {\displaystyle X_{1},X_{2},\ldots ,X_{n}} is exchangeable with σ 2 = var ( X i ) {\displaystyle \sigma ^{2}=\operatorname {var} (X_{i})} , then The finite sequence result may be proved as follows. Using
1495-460: Is exchangeable. This follows directly from the structure of the joint probability distribution generated by the i.i.d. form. Mixtures of exchangeable sequences (in particular, sequences of i.i.d. variables) are exchangeable. The converse can be established for infinite sequences, through an important representation theorem by Bruno de Finetti (later extended by other probability theorists such as Halmos and Savage ). The extended versions of
1560-522: Is however not very tight. In particular, for p = 1 , we have that F ( k ; n , p ) = 0 (for fixed k , n with k < n ), but Hoeffding's bound evaluates to a positive constant. A sharper bound can be obtained from the Chernoff bound : where D ( a ∥ p ) is the relative entropy (or Kullback-Leibler divergence) between an a -coin and a p -coin (i.e. between the Bernoulli(
1625-404: Is less common and would normally be described more fully as simple random sampling with replacement . Sampling done without replacement is no longer independent, but still satisfies exchangeability , hence most results of mathematical statistics still hold. Further, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since
1690-433: Is possible to make the denominator constant: When n is known, the parameter p can be estimated using the proportion of successes: This estimator is found using maximum likelihood estimator and also the method of moments . This estimator is unbiased and uniformly with minimum variance , proven using Lehmann–Scheffé theorem , since it is based on a minimal sufficient and complete statistic (i.e.: x ). It
1755-428: Is said to be exchangeable precisely if the sequence of its indicator functions is exchangeable.) The distribution function F X 1 ,..., X n ( x 1 , ..., x n ) of a finite sequence of exchangeable random variables is symmetric in its arguments x 1 , ..., x n . Olav Kallenberg provided an appropriate definition of exchangeability for continuous-time stochastic processes. The concept
1820-449: Is similar to cluster sampling, since the choice of the first unit will determine the remainder. This is no longer simple random sampling, because some combinations of 100 students have a larger selection probability than others – for instance, {3, 13, 23, ..., 993} has a 1/10 chance of selection, while {1, 2, 3, ..., 100} cannot be selected under this method. If the members of the population come in three kinds, say "blue" "red" and "black",
1885-571: Is that de Finetti's theorem characterizes exchangeable sequences as mixtures of i.i.d. sequences—while an exchangeable sequence need not itself be unconditionally i.i.d., it can be expressed as a mixture of underlying i.i.d. sequences. This means that infinite sequences of exchangeable random variables can be regarded equivalently as sequences of conditionally i.i.d. random variables, based on some underlying distributional form. (Note that this equivalence does not quite hold for finite exchangeability. However, for finite vectors of random variables there
1950-407: Is the discrete probability distribution of the number of successes in a sequence of n independent experiments , each asking a yes–no question , and each with its own Boolean -valued outcome : success (with probability p ) or failure (with probability q = 1 − p ). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes
2015-434: Is the draw-by-draw algorithm where at each step we remove the item at that step from the set with equal probability and put the item in the sample. We continue until we have a sample of desired size k {\displaystyle k} . The drawback of this method is that it requires random access in the set. The selection-rejection algorithm developed by Fan et al. in 1962 requires a single pass over data; however, it
Simple random sample - Misplaced Pages Continue
2080-470: The Stirling numbers of the second kind , and n k _ = n ( n − 1 ) ⋯ ( n − k + 1 ) {\displaystyle n^{\underline {k}}=n(n-1)\cdots (n-k+1)} is the k {\displaystyle k} th falling power of n {\displaystyle n} . A simple bound follows by bounding
2145-475: The mode of a binomial B ( n , p ) distribution is equal to ⌊ ( n + 1 ) p ⌋ {\displaystyle \lfloor (n+1)p\rfloor } , where ⌊ ⋅ ⌋ {\displaystyle \lfloor \cdot \rfloor } is the floor function . However, when ( n + 1) p is an integer and p is neither 0 nor 1, then the distribution has two modes: ( n + 1) p and ( n + 1) p − 1 . When p
2210-611: The n trials. The binomial distribution is concerned with the probability of obtaining any of these sequences, meaning the probability of obtaining one of them ( p q ) must be added ( n k ) {\textstyle {\binom {n}{k}}} times, hence Pr ( X = k ) = ( n k ) p k ( 1 − p ) n − k {\textstyle \Pr(X=k)={\binom {n}{k}}p^{k}(1-p)^{n-k}} . In creating reference tables for binomial distribution probability, usually,
2275-417: The regularized incomplete beta function , as follows: which is equivalent to the cumulative distribution function of the F -distribution : Some closed-form bounds for the cumulative distribution function are given below . If X ~ B ( n , p ) , that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding
2340-509: The Binomial moments via the higher Poisson moments : This shows that if c = O ( n p ) {\displaystyle c=O({\sqrt {np}})} , then E [ X c ] {\displaystyle \operatorname {E} [X^{c}]} is at most a constant factor away from E [ X ] c {\displaystyle \operatorname {E} [X]^{c}} Usually
2405-540: The draws are not independent and so the resulting distribution is a hypergeometric distribution , not a binomial one. However, for N much larger than n , the binomial distribution remains a good approximation, and is widely used. If the random variable X follows the binomial distribution with parameters n ∈ N {\displaystyle \mathbb {N} } and p ∈ [0, 1] , we write X ~ B ( n , p ) . The probability of getting exactly k successes in n independent Bernoulli trials (with
2470-439: The empirical distribution is always well-defined.) This means that for any vector of random variables in the sequence we have joint distribution function given by If the distribution function F X {\displaystyle F_{\mathbf {X} }} is indexed by another parameter θ {\displaystyle \theta } then (with densities appropriately defined) we have These equations show
2535-571: The estimator: When estimating p with very rare events and a small n (e.g.: if x = 0 ), then using the standard estimator leads to p ^ = 0 , {\displaystyle {\widehat {p}}=0,} which sometimes is unrealistic and undesirable. In such cases there are various alternative estimators. One way is to use the Bayes estimator p ^ b {\displaystyle {\widehat {p}}_{b}} , leading to: Another method
2600-503: The exception of the case where ( n + 1) p is an integer. In this case, there are two values for which f is maximal: ( n + 1) p and ( n + 1) p − 1 . M is the most probable outcome (that is, the most likely, although this can still be unlikely overall) of the Bernoulli trials and is called the mode . Equivalently, M − p < np ≤ M + 1 − p . Taking the floor function , we obtain M = floor( np ) . Suppose
2665-483: The fact that the values are exchangeable, we have We can then solve the inequality for the covariance yielding the stated lower bound. The non-negativity of the covariance for the infinite sequence can then be obtained as a limiting result from this finite sequence result. Equality of the lower bound for finite sequences is achieved in a simple urn model: An urn contains 1 red marble and n − 1 green marbles, and these are sampled without replacement until
Simple random sample - Misplaced Pages Continue
2730-581: The fact that the variance of a sum of independent random variables is the sum of the variances. The first 6 central moments , defined as μ c = E [ ( X − E [ X ] ) c ] {\displaystyle \mu _{c}=\operatorname {E} \left[(X-\operatorname {E} [X])^{c}\right]} , are given by The non-central moments satisfy and in general where { c k } {\displaystyle \textstyle \left\{{c \atop k}\right\}} are
2795-414: The gaps. Exchangeable random variables In statistics , an exchangeable sequence of random variables (also sometimes interchangeable ) is a sequence X 1 , X 2 , X 3 , ... (which may be finitely or infinitely long) whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words,
2860-692: The i.i.d. form means that the latter can be justified on the basis of infinite exchangeability. This notion is central to Bruno de Finetti's development of predictive inference and to Bayesian statistics . It can also be shown to be a useful foundational assumption in frequentist statistics and to link the two paradigms. The representation theorem: This statement is based on the presentation in O'Neill (2009) in references below. Given an infinite sequence of random variables X = ( X 1 , X 2 , X 3 , … ) {\displaystyle \mathbf {X} =(X_{1},X_{2},X_{3},\ldots )} we define
2925-423: The inverse of selection probability for each sample is equal. Consider a school with 1000 students, and suppose that a researcher wants to select 100 of them for further study. All their names might be put in a bucket and then 100 names might be pulled out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability ( P ) of a given person being chosen, since we know
2990-408: The joint distribution is invariant to finite permutation. Thus, for example the sequences both have the same joint probability distribution. It is closely related to the use of independent and identically distributed random variables in statistical models. Exchangeable sequences of random variables arise in cases of simple random sampling . Formally, an exchangeable sequence of random variables
3055-457: The joint distribution or density characterised as a mixture distribution based on the underlying limiting empirical distribution (or a parameter indexing this distribution). Note that not all finite exchangeable sequences are mixtures of i.i.d. To see this, consider sampling without replacement from a finite set until no elements are left. The resulting sequence is exchangeable, but not a mixture of i.i.d. Indeed, conditioned on all other elements in
3120-443: The key and selects the smallest k {\displaystyle k} items. J. Vitter in 1985 proposed reservoir sampling algorithms, which are widely used. This algorithm does not require knowledge of the size of the population n {\displaystyle n} in advance, and uses constant space. Random sampling can also be accelerated by sampling from the distribution of gaps between samples and skipping over
3185-589: The limiting empirical distribution function F X {\displaystyle F_{\mathbf {X} }} by (This is the Cesàro limit of the indicator functions. In cases where the Cesàro limit does not exist this function can actually be defined as the Banach limit of the indicator functions, which is an extension of this limit. This latter limit always exists for sums of indicator functions, so that
3250-403: The mode is 0 for p = 0 {\displaystyle p=0} and n {\displaystyle n} for p = 1 {\displaystyle p=1} . Let 0 < p < 1 {\displaystyle 0<p<1} . We find From this follows So when ( n + 1 ) p − 1 {\displaystyle (n+1)p-1}
3315-519: The number of red elements in a sample of given size will vary by sample and hence is a random variable whose distribution can be studied. That distribution depends on the numbers of red and black elements in the full population. For a simple random sample with replacement, the distribution is a binomial distribution . For a simple random sample without replacement, one obtains a hypergeometric distribution . Several efficient algorithms for simple random sampling have been developed. A naive algorithm
SECTION 50
#17328011654793380-461: The odds of a given pair being 01 or 10 are equal. Exchangeable random variables arise in the study of U statistics , particularly in the Hoeffding decomposition. Exchangeability is a key assumption of the distribution-free inference method of conformal prediction . Binomial distribution In probability theory and statistics , the binomial distribution with parameters n and p
3445-501: The posterior mean estimator becomes: (A posterior mode should just lead to the standard estimator.) This method is called the rule of succession , which was introduced in the 18th century by Pierre-Simon Laplace . When relying on Jeffreys prior , the prior is Beta ( α = 1 2 , β = 1 2 ) {\displaystyle \operatorname {Beta} (\alpha ={\frac {1}{2}},\beta ={\frac {1}{2}})} , which leads to
3510-429: The probability of choosing the same individual twice is low. Survey methodology textbooks generally consider simple random sampling without replacement as the benchmark to compute the relative efficiency of other sampling approaches. An unbiased random selection of individuals is important so that if many samples were drawn, the average sample would accurately represent the population. However, this does not guarantee that
3575-423: The probability that there are at most k successes. Since Pr ( X ≥ k ) = F ( n − k ; n , 1 − p ) {\displaystyle \Pr(X\geq k)=F(n-k;n,1-p)} , these bounds can also be seen as bounds for the upper tail of the cumulative distribution function for k ≥ np . Hoeffding's inequality yields the simple bound which
3640-416: The same probability of being achieved (regardless of positions of successes within the sequence). There are ( n k ) {\textstyle {\binom {n}{k}}} such sequences, since the binomial coefficient ( n k ) {\textstyle {\binom {n}{k}}} counts the number of ways to choose the positions of the k successes among
3705-460: The same probability of selection. If a systematic pattern is introduced into random sampling, it is referred to as "systematic (random) sampling". An example would be if the students in the school had numbers attached to their names ranging from 0001 to 1000, and we chose a random starting point, e.g. 0533, and then picked every 10th name thereafter to give us our sample of 100 (starting over with 0003 after reaching 0993). In this sense, this technique
3770-526: The same rate p ) is given by the probability mass function : for k = 0, 1, 2, ..., n , where is the binomial coefficient . The formula can be understood as follows: p q is the probability of obtaining the sequence of n independent Bernoulli trials in which k trials are "successes" and the remaining n − k trials result in "failure". Since the trials are independent with probabilities remaining constant between them, any sequence of n trials with k successes (and n − k failures) has
3835-470: The sample size ( n ) and the population ( N ): 1. In the case that any given person can only be selected once (i.e., after selection a person is removed from the selection pool): 2. In the case that any selected person is returned to the selection pool (i.e., can be picked more than once): This means that every student in the school has in any case approximately a 1 in 10 chance of being selected using this method. Further, any combination of 100 students has
3900-413: The sequence, the remaining element is known. Exchangeable sequences have some basic covariance and correlation properties which mean that they are generally positively correlated. For infinite sequences of exchangeable random variables, the covariance between the random variables is equal to the variance of the mean of the underlying distribution function. For finite exchangeable sequences the covariance
3965-457: The subsets that are arranged as a single column are eligible for selection. There are also ways of constructing multistage sampling , that are not srs, while the final sample will be epsem. For example, systematic random sampling produces a sample for which each individual unit has the same probability of inclusion, but different sets of units have different probabilities of being selected. Samples that are epsem are self weighting , meaning that
SECTION 60
#17328011654794030-477: The table is filled in up to n /2 values. This is because for k > n /2 , the probability can be calculated by its complement as Looking at the expression f ( k , n , p ) as a function of k , there is a k value that maximizes it. This k value can be found by calculating and comparing it to 1. There is always an integer M that satisfies f ( k , n , p ) is monotone increasing for k < M and monotone decreasing for k > M , with
4095-441: The theorem show that in any infinite sequence of exchangeable random variables, the random variables are conditionally independent and identically-distributed , given the underlying distributional form. This theorem is stated briefly below. (De Finetti's original theorem only showed this to be true for random indicator variables, but this was later extended to encompass all sequences of random variables.) Another way of putting this
4160-534: The urn is empty. Let X i = 1 if the red marble is drawn on the i -th trial and 0 otherwise. A finite sequence that achieves the lower covariance bound cannot be extended to a longer exchangeable sequence. The von Neumann extractor is a randomness extractor that depends on exchangeability: it gives a method to take an exchangeable sequence of 0s and 1s ( Bernoulli trials ), with some probability p of 0 and q = 1 − p {\displaystyle q=1-p} of 1, and produce
4225-492: Was introduced by William Ernest Johnson in his 1924 book Logic, Part III: The Logical Foundations of Science . Exchangeability is equivalent to the concept of statistical control introduced by Walter Shewhart also in 1924. The property of exchangeability is closely related to the use of independent and identically distributed (i.i.d.) random variables in statistical models. A sequence of random variables that are i.i.d, conditional on some underlying distributional form,
#478521