The Stanford Sleepiness Scale (SSS) , developed by William C. Dement and colleagues in 1972, is a one-item self-report questionnaire measuring levels of sleepiness throughout the day. The scale has been validated for adult populations and is generally used to track overall alertness at each hour of the day. The SSS is used in both research and clinical settings to assess the level of intervention or effectiveness of a specific treatment in order to compare a client's progress.
33-429: Reliability refers to whether the scores are reproducible. Unless otherwise specified, the reliability scores and values come from studies done with a United States population sample. Validity describes the evidence that an assessment tool measures what it was supposed to measure. Unless otherwise specified, the reliability scores and values come from studies done with a United States population sample. The SSS
66-412: A mean or a standard deviation . If a population exactly follows a known and defined distribution, for example the normal distribution , then a small set of parameters can be measured which provide a comprehensive description of the population, and can be considered to define a probability distribution for the purposes of extracting samples from this population. A "parameter" is to a population as
99-432: A " statistic " is to a sample ; that is to say, a parameter describes the true value calculated from the full population (such as the population mean ), whereas a statistic is an estimated measurement of the parameter based on a sample (such as the sample mean ). Thus a "statistical parameter" can be more specifically referred to as a population parameter . Suppose that we have an indexed family of distributions. If
132-436: A limit on the overall validity of a test. A test that is not perfectly reliable cannot be perfectly valid, either as a means of measuring attributes of a person or as a means of predicting scores on a criterion. While a reliable test may provide useful valid information, a test that is not reliable cannot possibly be valid. For example, if a set of weighing scales consistently measured the weight of an object as 500 grams over
165-531: A research study or for treatment intervention. Since the development of the SSS, there have been other more specific and more recently developed sleepiness rating scales, such as the Epworth Sleepiness Scale , which is more commonly used in other populations. Due to the fact that it has only been translated into English, it is not significantly used in other populations. The primary limitations of
198-402: Is reasonable to assume that errors are equally likely to be positive or negative, and that they are not correlated with true scores or with errors on other tests. It is assumed that: 1. Mean error of measurement = 0 2. True scores and errors are uncorrelated 3. Errors on different measures are uncorrelated Reliability theory shows that the variance of obtained scores is simply the sum of
231-481: Is reasonable to assume that the effect will not be as strong with alternate forms of the test as with two administrations of the same test. However, this technique has its disadvantages: 3. Split-half method : This method treats the two halves of a measure as alternate forms. It provides a simple solution to the problem that the parallel-forms method faces: the difficulty in developing alternate forms. It involves: The correlation between these two split halves
264-417: Is that measurement errors are essentially random. This does not mean that errors arise from random processes. For any individual, an error in measurement is not a completely random event. However, across a large number of individuals, the causes of measurement error are assumed to be so varied that measure errors act as random variables. If errors have the essential characteristics of random variables, then it
297-529: Is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 (much error) and 1.00 (no error), are usually used to indicate
330-472: Is used in estimating the reliability of the test. This halves reliability estimate is then stepped up to the full test length using the Spearman–Brown prediction formula . There are several ways of splitting a test to estimate reliability. For example, a 40-item vocabulary test could be split into two subtests, the first one made up of items 1 through 20 and the second made up of items 21 through 40. However,
363-401: Is used to estimate the reliability of the test. This method provides a partial solution to many of the problems inherent in the test-retest reliability method . For example, since the two forms of the test are different, carryover effect is less of a problem. Reactivity effects are also partially controlled; although taking the first test may change responses to the second test. However, it
SECTION 10
#1732787455359396-439: The dependent variables are related to the independent variables. During an election, there may be specific percentages of voters in a country who would vote for each particular candidate – these percentages would be statistical parameters. It is impractical to ask every voter before an election occurs what their candidate preferences are, so a sample of voters will be polled, and a statistic (also called an estimator ) – that is,
429-485: The Stanford Sleepiness Scale is that it is a self-report measure , because of this, levels of sleepiness may be over or under reported based on personal biases. Reliability (statistics) In statistics and psychometrics , reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions: "It
462-469: The absence of error. Errors of measurement are composed of both random error and systematic error . It represents the discrepancies between scores obtained on tests and the corresponding true scores. This conceptual breakdown is typically represented by the simple equation: The goal of reliability theory is to estimate errors in measurement and to suggest ways of improving tests so that errors are minimized. The central assumption of reliability theory
495-568: The amount of error in the scores." For example, measurements of people's height and weight are often extremely reliable. There are several general classes of reliability estimates: Reliability does not imply validity . That is, a reliable measure that is measuring something consistently is not necessarily measuring what you want to be measured. For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance. While reliability does not imply validity , reliability does place
528-423: The attribute being measured. These factors include: The goal of estimating reliability is to determine how much of the variability in test scores is due to measurement errors and how much is due to variability in true scores ( true value ). A true score is the replicable feature of the concept being measured. It is the part of the observed score that would recur across different measurement occasions in
561-563: The concept of reliability from a single index to a function called the information function . The IRT information function is the inverse of the conditional observed score standard error at any given test score. The goal of estimating reliability is to determine how much of the variability in test scores is due to errors in measurement and how much is due to variability in true scores. Four practical strategies have been developed that provide workable methods of estimating test reliability. 1. Test-retest reliability method : directly assesses
594-704: The degree to which test scores are consistent from one test administration to the next. It involves: The correlation between scores on the first test and the scores on the retest is used to estimate the reliability of the test using the Pearson product-moment correlation coefficient : see also item-total correlation . 2. Parallel-forms method : The key to this method is the development of alternate test forms that are equivalent in terms of content, response processes and statistical characteristics. For example, alternate forms exist for several tests of general intelligence, and these tests are generally seen equivalent. With
627-411: The distribution is known exactly. The family of chi-squared distributions can be indexed by the number of degrees of freedom : the number of degrees of freedom is a parameter for the distributions, and so the family is thereby parameterized. In statistical inference , parameters are sometimes taken to be unobservable, and in this case the statistician's task is to estimate or infer what they can about
660-449: The effects of inconsistency on the accuracy of measurement. The basic starting point for almost all theories of test reliability is the idea that test scores reflect the influence of two sorts of factors: 1. Consistency factors: stable characteristics of the individual or the attribute that one is trying to measure. 2. Inconsistency factors: features of the individual or the situation that can affect test scores but have nothing to do with
693-420: The following: Where a probability distribution has a domain over a set of objects that are themselves probability distributions, the term concentration parameter is used for quantities that index how variable the outcomes would be. Quantities such as regression coefficients are statistical parameters in the above sense because they index the family of conditional probability distributions that describe how
SECTION 20
#1732787455359726-469: The index is also a parameter of the members of the family, then the family is a parameterized family . Among parameterized families of distributions are the normal distributions , the Poisson distributions , the binomial distributions , and the exponential family of distributions . For example, the family of normal distributions has two parameters, the mean and the variance : if those are specified,
759-473: The kind of statistical procedure being carried out (for example, the number of degrees of freedom in a Pearson's chi-squared test ). Even if a family of distributions is not specified, quantities such as the mean and variance can generally still be regarded as statistical parameters of the population, and statistical procedures can still attempt to make inferences about such population parameters. Parameters are given names appropriate to their roles, including
792-525: The methods to estimate reliability include test-retest reliability , internal consistency reliability, and parallel-test reliability . Each method comes at the problem of figuring out the source of error in the test somewhat differently. It was well known to classical test theorists that measurement precision is not uniform across the scale of measurement. Tests tend to distinguish better for test-takers with moderate trait levels and worse among high- and low-scoring test-takers. Item response theory extends
825-422: The parallel test model it is possible to develop two forms of a test that are equivalent in the sense that a person's true score on form A would be identical to their true score on form B. If both forms of the test were administered to a number of people, differences between scores on form A and form B may be due to errors in measurement only. It involves: The correlation between scores on the two alternate forms
858-413: The parameter based on a random sample of observations taken from the full population. Estimators of a set of parameters of a specific distribution are often measured for a population, under the assumption that the population is (at least approximately) distributed according to that specific probability distribution. In other situations, parameters may be fixed by the nature of the sampling procedure used or
891-473: The percentage of the sample of polled voters – will be measured instead. The statistic, along with an estimation of its accuracy (known as its sampling error ), is then used to make inferences about the true statistical parameters (the percentages of all voters). Similarly, in some forms of testing of manufactured products, rather than destructively testing all products, only a sample of products are tested. Such tests gather statistics supporting an inference that
924-406: The reliability coefficient is defined as the ratio of true score variance to the total variance of test scores. Or, equivalently, one minus the ratio of the variation of the error score and the variation of the observed score : Unfortunately, there is no way to directly observe or calculate the true score, so a variety of methods are used to estimate the reliability of a test. Some examples of
957-415: The responses from the first half may be systematically different from responses in the second half due to an increase in item difficulty and fatigue. In splitting a test, the two halves would need to be as similar as possible, both in terms of their content and in terms of the probable state of the respondent. The simplest method is to adopt an odd-even split, in which the odd-numbered items form one half of
990-405: The test and the even-numbered items form the other. This arrangement guarantees that each half will contain an equal number of items from the beginning, middle, and end of the original test. True value In statistics , as opposed to its general use in mathematics , a parameter is any quantity of a statistical population that summarizes or describes an aspect of the population, such as
1023-475: The true weight, then the scale would be very reliable, but it would not be valid (as the returned weight is not the true weight). For the scale to be valid, it should return the true weight of an object. This example demonstrates that a perfectly reliable measure is not necessarily valid, but that a valid measure necessarily must be reliable. In practice, testing measures are never perfectly consistent. Theories of test reliability have been developed to estimate
Stanford Sleepiness Scale - Misplaced Pages Continue
1056-475: The variance of true scores plus the variance of errors of measurement . This equation suggests that test scores vary as the result of two factors: 1. Variability in true scores 2. Variability due to errors of measurement. The reliability coefficient ρ x x ′ {\displaystyle \rho _{xx'}} provides an index of the relative influence of true and error scores on attained test scores. In its general form,
1089-401: Was developed to measure subjective sleepiness in research and clinical settings. Other instruments measuring sleepiness tend to examine the general experience of sleepiness over the course of a day, but the SSS met a need for a scale measuring sleepiness in specific moments of time. Because it can be used to evaluate specific moments, the scale can be used repeatedly at different time intervals in
#358641