At the Mr Excel discussion forum the output of the worksheet function Linest(), compared with chart trend lines raised a question, where it is said that in Excel 2007 the trend line gives different (and incorrect) coefficients for a polynomial curve constrained to pass through the origin, and that the R^2 values reported by the two methods are different.

I haven’t been able to reproduce the problem of different output from the trend line and Linest for the regression coefficients, but it is true that if the regression line is constrained to pass through the origin then the R^2 value (The coefficient of determination) is different in the trend line results and Linest (see screen shot below). This is true both for polynomial results, as reported in the Mr Excel post, and for linear regression.

The Excel help is unhelpful in explaining this difference. More detail is given at the MSDN developer center, where the basis of the calculation of the R^2 value in Linest is described, but there is no explanation as to why the chart trend line gives different results, and I could find no other relevent discussion on the matter.

The clearest explanation I found was at the Curvefit.com site, where they had this to say, in relation to their own software:

“Why Prism doesn’t report r2 in constrained linear regression

Prism does not report r2 when you force the line through the origin (or any other point), because the calculations would be ambiguous. There are two ways to compute r2 when the regression line is constrained. As you saw in the previous section, r2 is computed by comparing the sum-of-squares from the regression line with the sum-of-squares from a model defined by the null hypothesis. With constrained regression, there are two possible null hypotheses. One is a horizontal line through the mean of all Y values. But this line doesn’t follow the constraint — it does not go through the origin. The other null hypothesis would be a horizontal line through the origin, far from most of the data.

Because r2 is ambiguous in constrained linear regression, Prism doesn’t report it. If you really want to know a value for r2, use nonlinear regression to fit your data to the equation Y=slope*X. Prism will report r2 defined the first way (comparing regression sum-of-squares to the sum-of-squares from a horizontal line at the mean Y value).”

Now it seems that the Excel Linest() function uses the second hypothesis for a constrained regression line, resulting in a higher R^2 value, compared with the unconstrained line. The chart trend line function on the other hand seems to use the first hypothesis.

This is confirmed in the calculation below:

It would have been nice if Microsoft could have explained that.

The examples shown above can be downloaded from: Linest Check.zip

Pingback: Daily Download 14: Curve Fitting 1 | Newton Excel Bach, not (just) an Excel Blog