In statistics , linear regression is a model that estimates the linear relationship between a scalar response ( dependent variable ) and one or more explanatory variables ( regressor or independent variable ). A model with exactly one explanatory variable is a simple linear regression ; a model with two or more explanatory variables is a multiple linear regression . This term is distinct from multivariate linear regression , which predicts multiple correlated dependent variables rather than a single dependent variable.
68-461: (Redirected from Trendline ) Trend line can refer to: A linear regression in statistics The result of trend estimation in statistics Trend line (technical analysis) , a tool in technical analysis Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title Trend line . If an internal link led you here, you may wish to change
136-499: A 2 / 2 {\displaystyle a^{2}/2} for small values of a {\displaystyle a} , and approximates a straight line with slope δ {\displaystyle \delta } for large values of a {\displaystyle a} . While the above is the most common form, other smooth approximations of the Huber loss function also exist. For classification purposes,
204-475: A closed-form solution , robustness with respect to heavy-tailed distributions, and theoretical assumptions needed to validate desirable statistical properties such as consistency and asymptotic efficiency . Huber loss In statistics , the Huber loss is a loss function used in robust regression , that is less sensitive to outliers in data than the squared error loss . A variant for classification
272-449: A median -unbiased estimator (in the one-dimensional case, and a geometric median -unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of a {\displaystyle a} 's (as in ∑ i = 1 n L ( a i ) {\textstyle \sum _{i=1}^{n}L(a_{i})} ),
340-406: A often refers to the residuals, that is to the difference between the observed and predicted values a = y − f ( x ) {\displaystyle a=y-f(x)} , so the former can be expanded to The Huber loss is the convolution of the absolute value function with the rectangular function , scaled and translated. Thus it "smoothens out" the former's corner at
408-410: A common value for the given predictor variable. This is the only interpretation of "held fixed" that can be used in an observational study . The notion of a "unique effect" is appealing when studying a complex system where multiple interrelated components influence the response variable. In some cases, it can literally be interpreted as the causal effect of an intervention that is linked to the value of
476-576: A group of predictor variables, say, { x 1 , x 2 , … , x q } {\displaystyle \{x_{1},x_{2},\dots ,x_{q}\}} , a group effect ξ ( w ) {\displaystyle \xi (\mathbf {w} )} is defined as a linear combination of their parameters where w = ( w 1 , w 2 , … , w q ) ⊺ {\displaystyle \mathbf {w} =(w_{1},w_{2},\dots ,w_{q})^{\intercal }}
544-407: A predictor variable. However, it has been argued that in many cases multiple regression analysis fails to clarify the relationships between the predictor variables and the response variable when the predictors are correlated with each other and are not assigned following a study design. Numerous extensions of linear regression have been developed, which allow some or all of the assumptions underlying
612-400: A study design, the comparisons of interest may literally correspond to comparisons among units whose predictor variables have been "held fixed" by the experimenter. Alternatively, the expression "held fixed" can refer to a selection that takes place in the context of data analysis. In this case, we "hold a variable fixed" by restricting our attention to the subsets of the data that happen to have
680-505: A variant of the Huber loss called modified Huber is sometimes used. Given a prediction f ( x ) {\displaystyle f(x)} (a real-valued classifier score) and a true binary class label y ∈ { + 1 , − 1 } {\displaystyle y\in \{+1,-1\}} , the modified Huber loss is defined as The term max ( 0 , 1 − y f ( x ) ) {\displaystyle \max(0,1-y\,f(x))}
748-417: Is a framework for modeling response variables that are bounded or discrete. This is used, for example: Generalized linear models allow for an arbitrary link function , g , that relates the mean of the response variable(s) to the predictors: E ( Y ) = g − 1 ( X B ) {\displaystyle E(Y)=g^{-1}(XB)} . The link function is often related to
SECTION 10
#1732787002088816-476: Is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is for each observation i = 1 , … , n {\textstyle i=1,\ldots ,n} . In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Y i
884-577: Is a meaningful effect. It can be accurately estimated by its minimum-variance unbiased linear estimator ξ ^ A = 1 q ( β ^ 1 ′ + β ^ 2 ′ + ⋯ + β ^ q ′ ) {\textstyle {\hat {\xi }}_{A}={\frac {1}{q}}({\hat {\beta }}_{1}'+{\hat {\beta }}_{2}'+\dots +{\hat {\beta }}_{q}')} , even when individually none of
952-435: Is a special group effect with weights w 1 = 1 {\displaystyle w_{1}=1} and w j = 0 {\displaystyle w_{j}=0} for j ≠ 1 {\displaystyle j\neq 1} , but it cannot be accurately estimated by β ^ 1 ′ {\displaystyle {\hat {\beta }}'_{1}} . It
1020-551: Is a weight vector satisfying ∑ j = 1 q | w j | = 1 {\textstyle \sum _{j=1}^{q}|w_{j}|=1} . Because of the constraint on w j {\displaystyle {w_{j}}} , ξ ( w ) {\displaystyle \xi (\mathbf {w} )} is also referred to as a normalized group effect. A group effect ξ ( w ) {\displaystyle \xi (\mathbf {w} )} has an interpretation as
1088-695: Is also not a meaningful effect. In general, for a group of q {\displaystyle q} strongly correlated predictor variables in an APC arrangement in the standardized model, group effects whose weight vectors w {\displaystyle \mathbf {w} } are at or near the centre of the simplex ∑ j = 1 q w j = 1 {\textstyle \sum _{j=1}^{q}w_{j}=1} ( w j ≥ 0 {\displaystyle w_{j}\geq 0} ) are meaningful and can be accurately estimated by their minimum-variance unbiased linear estimators. Effects with weight vectors far away from
1156-437: Is also sometimes used. The Huber loss function describes the penalty incurred by an estimation procedure f . Huber (1964) defines the loss function piecewise by This function is quadratic for small values of a , and linear for large values, with equal values and slopes of the different sections at the two points where | a | = δ {\displaystyle |a|=\delta } . The variable
1224-417: Is captured by x j . In this case, including the other variables in the model reduces the part of the variability of y that is unrelated to x j , thereby strengthening the apparent relationship with x j . The meaning of the expression "held fixed" may depend on how the values of the predictor variables arise. If the experimenter directly sets the values of the predictor variables according to
1292-754: Is meaningful when the latter is. Thus meaningful group effects of the original variables can be found through meaningful group effects of the standardized variables. In Dempster–Shafer theory , or a linear belief function in particular, a linear regression model may be represented as a partially swept matrix, which can be combined with similar matrices representing observations and other assumed normal distributions and state equations. The combination of swept or unswept matrices provides an alternative method for estimating linear regression models. A large number of procedures have been developed for parameter estimation and inference in linear regression. These methods differ in computational simplicity of algorithms, presence of
1360-400: Is minimized. For example, it is common to use the sum of squared errors ‖ ε ‖ 2 2 {\displaystyle \|{\boldsymbol {\varepsilon }}\|_{2}^{2}} as a measure of ε {\displaystyle {\boldsymbol {\varepsilon }}} for minimization. Consider a situation where a small ball is being tossed up in
1428-401: Is probable. Group effects provide a means to study the collective impact of strongly correlated predictor variables in linear regression models. Individual effects of such variables are not well-defined as their parameters do not have good interpretations. Furthermore, when the sample size is not large, none of their parameters can be accurately estimated by the least squares regression due to
SECTION 20
#17327870020881496-433: Is regressed on C . It is often used where the variables of interest have a natural hierarchical structure such as in educational statistics, where students are nested in classrooms, classrooms are nested in schools, and schools are nested in some administrative grouping, such as a school district. The response variable might be a measure of student achievement such as a test score, and different covariates would be collected at
1564-461: Is still assumed, with a matrix B replacing the vector β of the classical linear regression model. Multivariate analogues of ordinary least squares (OLS) and generalized least squares (GLS) have been developed. "General linear models" are also called "multivariate linear models". These are not the same as multivariable linear models (also called "multiple linear models"). Various models have been created that allow for heteroscedasticity , i.e.
1632-496: Is strongly correlated with other predictor variables, it is improbable that x j {\displaystyle x_{j}} can increase by one unit with other variables held constant. In this case, the interpretation of β j {\displaystyle \beta _{j}} becomes problematic as it is based on an improbable condition, and the effect of x j {\displaystyle x_{j}} cannot be evaluated in isolation. For
1700-423: Is the i observation of the dependent variable, X ij is i observation of the j independent variable, j = 1, 2, ..., p . The values β j represent parameters to be estimated, and ε i is the i independent identically distributed normal error. In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share
1768-449: Is the least squares estimator of β j ′ {\displaystyle \beta _{j}'} . In particular, the average group effect of the q {\displaystyle q} standardized variables is which has an interpretation as the expected change in y ′ {\displaystyle y'} when all x j ′ {\displaystyle x_{j}'} in
1836-412: The β j ′ {\displaystyle \beta _{j}'} can be accurately estimated by β ^ j ′ {\displaystyle {\hat {\beta }}_{j}'} . Not all group effects are meaningful or can be accurately estimated. For example, β 1 ′ {\displaystyle \beta _{1}'}
1904-413: The q {\displaystyle q} variables via testing H 0 : ξ A = 0 {\displaystyle H_{0}:\xi _{A}=0} versus H 1 : ξ A ≠ 0 {\displaystyle H_{1}:\xi _{A}\neq 0} , and (3) characterizing the region of the predictor variable space over which predictions by
1972-496: The Mean Squared Error (MSE) as the cost on a dataset that has many large outliers, can result in a model that fits the outliers more than the true data due to the higher importance assigned by MSE to large errors. So, cost functions that are robust to outliers should be used if the dataset has many large outliers . Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although
2040-405: The data . Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis , linear regression focuses on the conditional probability distribution of the response given the values of
2108-490: The multicollinearity problem. Nevertheless, there are meaningful group effects that have good interpretations and can be accurately estimated by the least squares regression. A simple way to identify these meaningful group effects is to use an all positive correlations (APC) arrangement of the strongly correlated variables under which pairwise correlations among these variables are all positive, and standardize all p {\displaystyle p} predictor variables in
Trend line - Misplaced Pages Continue
2176-580: The transpose , so that x i β is the inner product between vectors x i and β . Often these n equations are stacked together and written in matrix notation as where Fitting a linear model to a given data set usually requires estimating the regression coefficients β {\displaystyle {\boldsymbol {\beta }}} such that the error term ε = y − X β {\displaystyle {\boldsymbol {\varepsilon }}=\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}}
2244-457: The Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the δ {\displaystyle \delta } value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as As such, this function approximates
2312-416: The air and then we measure its heights of ascent h i at various moments in time t i . Physics tells us that, ignoring the drag , the relationship can be modeled as where β 1 determines the initial velocity of the ball, β 2 is proportional to the standard gravity , and ε i is due to measurement errors. Linear regression can be used to estimate the values of β 1 and β 2 from
2380-458: The basic model to be relaxed. The simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression . The extension to multiple and/or vector -valued predictor variables (denoted with a capital X ) is known as multiple linear regression , also known as multivariable linear regression (not to be confused with multivariate linear regression ). Multiple linear regression
2448-405: The boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points a = − δ {\displaystyle a=-\delta } and a = δ {\displaystyle a=\delta } . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of
2516-401: The central role of the linear predictor β ′ x as in the classical linear regression model. Under certain conditions, simply applying OLS to data from a single-index model will consistently estimate β up to a proportionality constant. Hierarchical linear models (or multilevel regression ) organizes the data into a hierarchy of regressions, for example where A is regressed on B , and B
2584-450: The centre are not meaningful as such weight vectors represent simultaneous changes of the variables that violate the strong positive correlations of the standardized variables in an APC arrangement. As such, they are not probable. These effects also cannot be accurately estimated. Applications of the group effects include (1) estimation and inference for meaningful group effects on the response variable, (2) testing for "group significance" of
2652-586: The centred y {\displaystyle y} and x j ′ {\displaystyle x_{j}'} be the standardized x j {\displaystyle x_{j}} . Then, the standardized linear regression model is Parameters β j {\displaystyle \beta _{j}} in the original model, including β 0 {\displaystyle \beta _{0}} , are simple functions of β j ′ {\displaystyle \beta _{j}'} in
2720-607: The classroom, school, and school district levels. Errors-in-variables models (or "measurement error models") extend the traditional linear regression model to allow the predictor variables X to be observed with error. This error causes standard estimators of β to become biased. Generally, the form of bias is an attenuation, meaning that the effects are biased toward zero. In a multiple linear regression model parameter β j {\displaystyle \beta _{j}} of predictor variable x j {\displaystyle x_{j}} represents
2788-419: The data strongly influence the performance of different estimation methods: A fitted linear regression model can be used to identify the relationship between a single predictor variable x j and the response variable y when all the other predictor variables in the model are "held fixed". Specifically, the interpretation of β j is the expected change in y for a one-unit change in x j when
Trend line - Misplaced Pages Continue
2856-878: The dependent variable y and the vector of regressors x is linear . This relationship is modeled through a disturbance term or error variable ε —an unobserved random variable that adds "noise" to the linear relationship between the dependent variable and regressors. Thus the model takes the form y i = β 0 + β 1 x i 1 + ⋯ + β p x i p + ε i = x i T β + ε i , i = 1 , … , n , {\displaystyle y_{i}=\beta _{0}+\beta _{1}x_{i1}+\cdots +\beta _{p}x_{ip}+\varepsilon _{i}=\mathbf {x} _{i}^{\mathsf {T}}{\boldsymbol {\beta }}+\varepsilon _{i},\qquad i=1,\ldots ,n,} where denotes
2924-440: The distribution of the response, and in particular it typically has the effect of transforming between the ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} range of the linear predictor and the range of the response variable. Some common examples of GLMs are: Single index models allow some degree of nonlinearity in the relationship between x and y , while preserving
2992-514: The errors for different response variables may have different variances . For example, weighted least squares is a method for estimating linear regression models when the response variables may have different error variances, possibly with correlated errors. (See also Weighted linear least squares , and Generalized least squares .) Heteroscedasticity-consistent standard errors is an improved method for use with uncorrelated but potentially heteroscedastic errors. The Generalized linear model (GLM)
3060-427: The expected change in y {\displaystyle y} when variables in the group x 1 , x 2 , … , x q {\displaystyle x_{1},x_{2},\dots ,x_{q}} change by the amount w 1 , w 2 , … , w q {\displaystyle w_{1},w_{2},\dots ,w_{q}} , respectively, at
3128-430: The following two broad categories: Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the " lack of fit " in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression ( L -norm penalty) and lasso ( L -norm penalty). Use of
3196-470: The group effect also reduces to an individual effect. A group effect ξ ( w ) {\displaystyle \xi (\mathbf {w} )} is said to be meaningful if the underlying simultaneous changes of the q {\displaystyle q} variables ( x 1 , x 2 , … , x q ) ⊺ {\displaystyle (x_{1},x_{2},\dots ,x_{q})^{\intercal }}
3264-403: The individual effect of x j {\displaystyle x_{j}} . It has an interpretation as the expected change in the response variable y {\displaystyle y} when x j {\displaystyle x_{j}} increases by one unit with other predictor variables held constant. When x j {\displaystyle x_{j}}
3332-400: The information in x j , so that once that variable is in the model, there is no contribution of x j to the variation in y . Conversely, the unique effect of x j can be large while its marginal effect is nearly zero. This would happen if the other covariates explained a great deal of the variation of y , but they mainly explain variation in a way that is complementary to what
3400-543: The least squares estimated model are accurate. A group effect of the original variables { x 1 , x 2 , … , x q } {\displaystyle \{x_{1},x_{2},\dots ,x_{q}\}} can be expressed as a constant times a group effect of the standardized variables { x 1 ′ , x 2 ′ , … , x q ′ } {\displaystyle \{x_{1}',x_{2}',\dots ,x_{q}'\}} . The former
3468-509: The link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=Trend_line&oldid=892575608 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Linear regression In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from
SECTION 50
#17327870020883536-429: The mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function). The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which
3604-404: The measured data. This model is non-linear in the time variable, but it is linear in the parameters β 1 and β 2 ; if we take regressors x i = ( x i 1 , x i 2 ) = ( t i , t i ), the model takes on the standard form Standard linear regression models with standard estimation techniques make a number of assumptions about the predictor variables,
3672-472: The model so that they all have mean zero and length one. To illustrate this, suppose that { x 1 , x 2 , … , x q } {\displaystyle \{x_{1},x_{2},\dots ,x_{q}\}} is a group of strongly correlated variables in an APC arrangement and that they are not strongly correlated with predictor variables outside the group. Let y ′ {\displaystyle y'} be
3740-419: The origin. Two very commonly used loss functions are the squared loss , L ( a ) = a 2 {\displaystyle L(a)=a^{2}} , and the absolute loss , L ( a ) = | a | {\displaystyle L(a)=|a|} . The squared loss function results in an arithmetic mean - unbiased estimator , and the absolute-value loss function results in
3808-511: The other covariates are held fixed—that is, the expected value of the partial derivative of y with respect to x j . This is sometimes called the unique effect of x j on y . In contrast, the marginal effect of x j on y can be assessed using a correlation coefficient or simple linear regression model relating only x j to y ; this effect is the total derivative of y with respect to x j . Care must be taken when interpreting regression results, as some of
3876-430: The predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis . Linear regression is also a type of machine learning algorithm , more specifically a supervised algorithm, that learns from the labelled datasets and maps the data points to the most optimized linear functions that can be used for prediction on new datasets. Linear regression
3944-428: The regressors may not allow for marginal changes (such as dummy variables , or the intercept term), while others cannot be held fixed (recall the example from the introduction: it would be impossible to "hold t i fixed" and at the same time change the value of t i ). It is possible that the unique effect be nearly zero even when the marginal effect is large. This may imply that some other covariate captures all
4012-552: The response variable y is still a scalar. Another term, multivariate linear regression , refers to cases where y is a vector, i.e., the same as general linear regression . The general linear model considers the situation when the response variable is not a scalar (for each observation) but a vector, y i . Conditional linearity of E ( y ∣ x i ) = x i T B {\displaystyle E(\mathbf {y} \mid \mathbf {x} _{i})=\mathbf {x} _{i}^{\mathsf {T}}B}
4080-755: The response variable and their relationship. Numerous extensions have been developed that allow each of these assumptions to be relaxed (i.e. reduced to a weaker form), and in some cases eliminated entirely. Generally these extensions make the estimation procedure more complex and time-consuming, and may also require more data in order to produce an equally precise model. The following are the major assumptions made by standard linear regression models with standard estimation techniques (e.g. ordinary least squares ): Violations of these assumptions can result in biased estimations of β , biased standard errors, untrustworthy confidence intervals and significance tests. Beyond these assumptions, several other statistical properties of
4148-420: The same set of explanatory variables and hence are estimated simultaneously with each other: for all observations indexed as i = 1, ... , n and for all dependent variables indexed as j = 1, ... , m . Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases
SECTION 60
#17327870020884216-611: The same time with other variables (not in the group) held constant. It generalizes the individual effect of a variable to a group of variables in that ( i {\displaystyle i} ) if q = 1 {\displaystyle q=1} , then the group effect reduces to an individual effect, and ( i i {\displaystyle ii} ) if w i = 1 {\displaystyle w_{i}=1} and w j = 0 {\displaystyle w_{j}=0} for j ≠ i {\displaystyle j\neq i} , then
4284-446: The sample mean is influenced too much by a few particularly large a {\displaystyle a} -values when the distribution is heavy tailed: in terms of estimation theory , the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions. As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum a = 0 {\displaystyle a=0} ; at
4352-422: The standardized model. A group effect of { x 1 ′ , x 2 ′ , … , x q ′ } {\displaystyle \{x_{1}',x_{2}',\dots ,x_{q}'\}} is and its minimum-variance unbiased linear estimator is where β ^ j ′ {\displaystyle {\hat {\beta }}_{j}'}
4420-431: The standardized model. The standardization of variables does not change their correlations, so { x 1 ′ , x 2 ′ , … , x q ′ } {\displaystyle \{x_{1}',x_{2}',\dots ,x_{q}'\}} is a group of strongly correlated variables in an APC arrangement and they are not strongly correlated with other predictor variables in
4488-469: The strongly correlated group increase by ( 1 / q ) {\displaystyle (1/q)} th of a unit at the same time with variables outside the group held constant. With strong positive correlations and in standardized units, variables in the group are approximately equal, so they are likely to increase at the same time and in similar amount. Thus, the average group effect ξ A {\displaystyle \xi _{A}}
4556-421: The terms "least squares" and "linear model" are closely linked, they are not synonymous. Given a data set { y i , x i 1 , … , x i p } i = 1 n {\displaystyle \{y_{i},\,x_{i1},\ldots ,x_{ip}\}_{i=1}^{n}} of n statistical units , a linear regression model assumes that the relationship between
4624-448: Was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine. Linear regression has many practical uses. Most applications fall into one of
#87912