## Using LINEST for non-linear curve fitting

Update 14 March 2020: See LinEstGap with non-linear functions for the latest version of Linest-Poly with new functions for non-linear curves allowing more convenient input and work with data with gaps errors, and/or hidden lines.

Update 28 June 2015:  Also see Using Linest for non-linear curve fitting examples, hints, and warnings for more examples of fitting exponential and polynomial curves using LinEst.

A frequent question on internet forums everywhere is how to do a least squares fit of a non-linear trend line to a set of data.  The most frequent answer is to plot the data on an XY (“scatter”) chart, and then use the “Fit Trendline” option, with the “display equation on chart” box checked.  The chart trendlines have the options of: Linear, Exponential, Logarithmic, Polynomial (up to order 6), and Power.  There is also a “Moving Average” option, but this does not provide a trendline equation.  The chart trendline solution is OK if what you want to do is display the trendline equation on a chart, but if you want to use the numbers in some further analysis, or even just display them elsewhere in the spreadsheet, or copy them to another document, it is far from convenient.  Fortunately it is straightforward to get the trendline equations (and other statistics) for each of the chart trendline types using the LINEST worksheet function.

Linest-poly.xls

The functions used for linear and polynomial trendlines are shown in the screenshot below (click image for full size view):

Linear and Polynomial Trendlines

Note that:

• The functions as displayed use named ranges (X_1 to X_3 and Y_1 to Y_3)
• The functions are entered as array functions to display all the return values; i.e. enter the function in a cell, select that cell and sufficient adjacent cells to display all the required values, press F2, press Ctrl-Shift-Enter.
• Alternatively the INDEX function may be used to return specific values; e.g. to return the b value from the linear example use =INDEX(LINEST(Y_1, X_1),2)
• Higher order polynomial functions may be returned by simply adding to the list of powers in the curly brackets (but note that this is often not a good idea because of “over-fitting“)

Functions for exponential, power, and logarithmic trendlines are shown below:

Exponential, Power and Logarithmic Trendlines

In this case the process is not quite so straightforward, because in most cases one or both of the values returned by the function must be modified to give the values shown in the chart trend lines.  For these lines it is possible to use either the LINEST function, or the LOGEST function, but since LOGEST simply calls LINEST internally, and provides little if any extra convenience, it does not seem to provide much value.  In these examples note that:

• Equations are in the form: y = a.e^bx (exponential), y = a.x^b (power) or y = b.ln(x) + a (logarithmic).  In each case in the examples the power factor (b) is shown in bold, and the constant term (a) is shown in bold and italic.
• The LOGEST function returns an equation of the form y = a.b^x
• The LINEST function will return exactly the same values if entered as =EXP(LINEST(LN(Yrange), XRange)), and this line is equivalent to the y = a.e^bx line returned by the chart.

Update 27 Jan 2011:

Coincidentally, Chandoo at Pointy Haired Dilbert is also running a series on estimating trend lines in Excel, which is well worth a look at: Are You Trendy

This entry was posted in Charts, Excel, Maths, Newton and tagged , , , , , . Bookmark the permalink.

### 91 Responses to Using LINEST for non-linear curve fitting

1. Scott Rogers says:

I would really like to understand how microsoft’s trend line for a +4th order polynomial’s trend line always comes out smooth? What do they do in their algorithm that is different from the Givens (Least Squares) method that almost any other curve fitting program can duplicate up to a 3rd order polynomial (well, the coefficients match at least), but after that MS Excel’s formula’s simply don’t match any other results???? And the kicker is that if you plot their solution, the trend line ALSO does not come out the same? MS Excel has become a “standard” for regression analysis in non-linear systems, so it would be really appreciated if they would provide insight into their algorithm’s.

Like

2. dougaj4 says:

Scott – can you give an example? With the same data I used for the curves up to cubic above (i.e. a circular quadrant with 11 points and radius 1), for a quartic I get:

-5.03771915 8.14914828 -4.68693650 0.64996956 0.99330532

with descending powers of x.

I get exactly the same coefficients from the chart trend line.

This is with Excel 2010, but I believe that the algorithm is unchanged since 2003. There are some acknowledged issues with earlier versions.

I’ll have a look with different software later, but that will probably have to wait to the weekend.

Like

3. Scott Rogers says:

I’m referring to polynomial trend lines greater than 4th order. I’m trying to develop a program to curve fit signals with very steep sideband slopes, and when I use Excel, the skirts are well adjusted to the curve (6th order polynomials in this example), but the trend line itself is incredibly smooth, and I have significant sign changes throughout the data, which should be reflected in the curve fit as well, and simply isn’t. When I compare polynomial coefficients, they too are significantly skewed, its almost as if MS Excel smooths the data, or performs some other optimization that I can not account for in my algorithm, however, if you take the trend line formula provided in Excel and plot it against the data in a different plotting tool (Matlab, LabVIEW, etc…) the curve fit line does not appear to resemble the plot in Excel. So, I’m really curious as to how accurate the curve fit in Excel truly is, it could be as simple as different accuracy weighting as well, but I am struggling finding a solution that matches Excel.

Like

4. dougaj4 says:

Scott – having looked at it again, there does seem to be a bug in the Linest results with polynomials of order 6 and higher. I have just fitted a polynomyial to the function y = 1/(x-4.99) for 100 points between x = 5 and x = 6. I used Linest, Excel chart trendline fitting, and the Alglib PolynomialFit routine.

I found that up to 5th order they all gave esentially the same results (but with rounding differences becoming significant at 5th order). For 6th order the chart line and the Alglib line were very close, but the Linest line still followed the same form as the 5th order (i.e. maxima and minima were at about the same positions). For 7th order the Alglib line changed (as you would expect), but the Linest line again stayed close to the 5th order position, and the chart line was past its limit.

I will write this up and post a UDF with the Alglib routine in the next few days. Drop me an e-mail (dougaj4 at google) if you would like a copy of the spreadsheet as it stands. You might also like to have a look at the Alglib site which has some information about how they approach the problem.

Like

5. dougaj4 says:

The only discussion of a similar problem I found in a quick search was here:

http://www.officekb.com/Uwe/Forum.aspx/excel-chart/13526/Excel-2007-Polynomial-Order-Incorrect

The responses were not very helpful unfortunately.

Like

6. Lori Miller says:

Doug – I don’t think it’s a bug but a consequence of collinearity see http://support.microsoft.com/kb/828533. A column may be excluded if the sum of squared residuals on a regression of the other predictor variables is small, see http://en.wikipedia.org/wiki/Multicollinearity. Unfortunately, it’s not clear from the description exactly how linest chooses in which order to exclude columns.

An equivalent of LINEST(Y,X) that doesn’t exclude columns is:

=MMULT(MMULT(TRANSPOSE(Y),X^N),MINVERSE(MMULT(TRANSPOSE(X^N),X^N)))

where N={1,0} for a linear fit, N={3,2,1,0} for a cubic fit, etc (X non-zero). However it’s not as numerically stable as the QR decomposition method that linest uses.

Like

7. Scott Rogers says:

Lori, would you mind ellaborating on your collinearity theory, I’m not understanding why that would cause the problem I am seeing.

Like

8. Lori Miller says:

Scott – The remarks were in response to Doug’s specific example, i can’t claim that they are necessarily relevant in your case but it does show that the algorithms used may become significant for higher order polynomials.

In my tests fitting a 6th degree polynomial gave an x^5 coefficient of zero for linest. This accords with the kb article since the results of linest(x^5,x^{1,2,3,4,6},,1) show that 1-R2=RSS/TSS<1e-16 so x^5 is omitted.

It’s also interesting to compare with fitted values calculated from mmult(X^N,transpose(b)) where b is given by the formula i posted above. For N={6,5,4,3,2,1,0} there is close agreement with linest but not with the chart line. If there is a follow up post a chart of this may shed more light.

Like

9. Jeff says:

I have used this function often for the pricing of fixed income securities, is is possible to modify it to allow for missing data? If for instance I do not have data in either the X or Y sets for specific row in my spreadsheet?

Like

10. Lori Miller says:

Jeff – To find a y-value for a given x-value using a fitted cubic curve, you can try:
=TREND(Yrange,Xrange^{1,2,3},Xvalue^{1,2,3})

Or to find an x-value for a given y-value you can try to find the root of the cubic using:
=IRR(LINEST(Yrange,Xrange^{1,2,3})-{0,0,0,1}*YValue)*1+1

These formulas are easily extended to other powers but you may be better off following the posts on splines for interpolating missing data.

Like

• Tom says:

Lori, can you explain more as to exactly what IRR does? Office just says that it is used to find the Internal Rate of Return. I think this is the solution I was needing for my problem but I want to make sure. I have data that I want to find a 5th order polynomial equation to. Then, given a Y-value, find the X-value. The answers that I’m getting using this formula seem to be reasonable, I just don’t know enough about the function of IRR to be confident its what I need.

Like

• dougaj4 says:

Tom – I’ll be interested to see Lori’s comments if she drops by, but my explanation of how it works is given here:
https://newtonexcelbach.wordpress.com/2011/12/01/linest-npv-irr-and-solving-polynomials/

Doug

Like

• Lori Miller says:

Doug – Thanks for the explanation which is much more comprehensive than mine would have been!

Tom – As a check, try inputting the X Value returned into the TREND formula above and verify that the result equals the original Y value. In fact, a slightly simpler verion of the formula is:
=IRR(LINEST(Yrange-YValue,Xrange^{1,2,3}))*1+1

I would like to add however, that while polynomial approximations can be very useful in theoretical analysis, there are rarely compelling reasons for choosing polynomial fits with empirical data. Models should be based on a priori assumptions as far as possible to avoid problems of data-snooping. If you need a nonlinear approximation for estimation purposes there are a multitutde of other smoothing methods that are often preferable.

Like

11. Lori Miller says:

Actually on rereading, the question is not directly about finding missing values but rather how to allow for missing values in data?

For a linear fit, SLOPE, INTERCEPT, RSQ and FORECAST skip rows containing blanks. For a nonlinear fit, it’s more challenging since for example LINEST(Y,X^{1,2,3}) errors if there are any blanks in the range. One approach is to try instead:
=LINEST(ISNUMBER(X)*Y,IF(X<>0,ISNUMBER(Y)*X^{0,1,2,3},0),0)

The results should match the values given in the chart trendlines. An extension that also allows for filtered data is to replace ISNUMBER(X) in the formula above by SUBTOTAL(3,OFFSET(X,ROW(X)-MIN(ROW(X)),,1)) and the same for ISNUMBER(Y). Similar substitutions can be applied to other formulas and you can assign names to the expressions for X and Y to keep things simple.

Clearly, there are other ways to achieve the same results, the obvious one being to make a copy without the rows containing the missing values, however this either needs to be done manually or a macro setup to do it for you each time which is less efficient.

Like

12. dougaj4 says:

Lori – thanks for the on-sheet solutions. I’m still trying to work out how the second one works!

Like

13. malcolm says:

I need to use linest to find the coefficients ‘a’ and ‘b’ that fit the curve y=a(1-exp(-bx)) to my data set. Judging by the comments here there are some clever people who would know how to do that, however I’m not one of them, it has me stumped! Thanks to anyone who can offer any help…

Like

14. dougaj4 says:

Malcolm – as far as i know the easiest way is using the Excel Solver, as given by electricpete at eng-tips:

To use Linest you have to be able to convert the function to a linear function of some functions of x.

Like

15. Georg says:

Hi all, multi-collinearity causes huge problems in polynomial regression if the range of x-values extends to values much larger than 1 in magnitude. For example, you might want to calculate the correlation coefficients between x, x^2, x^3 and so on if x = 20,21,22,23…,30; they are all close to 1. This can even cause fatal interpretation errors if the estimated confidence intervals are used for some kind of error propagation analysis without accounting for the large co-variances between the regression coefficients. The only way out of that trap is to use orthogonal (or even orthonormal) polynomials like those of Legendre as basis functions. As these are defined on the interval [-1;1], the variables have to be transformed before the regression takes place. And always mind the co-variances between the coefficients in an error propagation analysis, if you work with the coefficients for x, x^2…!
The usage of at least orthogonal polynomials is the only method that allows to reliably detect non-linear relationships far away from the origin, for example, when you want to do a non-linear regression of income on age (30 – 70) or so.

Like

• Lori Miller says:

Georg – Good points, some of these were alluded to in comments from the follow-up
post. For computing the coefficients, LINEST/TREND can be applied to data centered around the mean and the results are in close agreement to other high-precision polynomial regression algorithms. The QR/SVD decomposition methods for calculating least squares estimates can be seen as finite dimensional analogs to orthogonal polynomial expansions of L^2-functions.

Like

16. Rasm says:

I would like to see how others graphically show the col-linearity – I have just posted on http://www.excelfox.com how to get a correlation map using colors. The post is under the heading
“Using property ColorScaleCriteria color you cells” (in the download center) – what it shows is a correlation matrix of 3 wet chemistry assays (Y variables) and Absorbencies as the X data — so these are adjacent frequencies (X values) with a very high covariance.
Anybody who has examples of how to graphically show a map similar to what I just posted would be appreciated.

PS – I had to cut out data (max file size is 100kb) – so if you plot the X-values you will see ‘noise’ in the spectra.

Like

• Lori Miller says:

Rasm – I couldn’t log in to view your file, but for a visual plot of collinearity in three regressor variables I would plot one variable against a best fit linear combination of the other two. It’s insightful to do this for Doug’s polynomial example mentioned above.

Starting from a blank workbook, here’s a few steps to set up the chart and plot the data, no datasheets or code modules are required. You can just press Alt+F11 and enter sequentially in the immediate window, the corresponding UI commands should be self-evident.

``` charts.add activechart.seriescollection.newseries set s=activechart.SeriesCollection(1) s.Type=xlXYScatter names.add "x", [4.99+row(sheet1!1:101)/100] names.add "y", "=x^3" names.add "z", "=trend(x^3,x^{1,2})" s.formula = "=series(,sheet1!y,sheet1!z,1)" s.trendlines.add DisplayRSquared:=True s.trendlines(1).datalabel.numberformat="0.000000000000000" ```

This gives a near exact straight line with R^2=0.99999946. Choosing Debug>Add Watch with [linest(x^3,x^{1,2})] also gives the same value in the (3,1) element. Extending to [linest(x^6,x^{1,2,3,4,5})] gives R^2=1 exactly to 15dp, so x^6 coefficeint is dropped from the calculation of coefficients as it adds no more information.

To reduce the collinearity the first step is to center around the mean by changing 4.99 to -0.51 above this gives R^2=0.8401098. The second step is to transform the columns so they are uncorrelated, this is what Excel and other least squares methods do “behind the scenes”. Choosing the cubic Legendre polynomial in place of x^3 as below gives R^2=0.0019898.

``` names.add "x", [(x-average(x))*2] names.add "y", "=2.5*x^3-1.5*x" ```

Like

• Rasm says:

Lori
I will try your method – but I have several 100s X values – up 1050.
I do preprocess the data i.e. SNV and typically a 1st derivative (with a smooth and a gap) – next I mean center the data. I typically find the best model using PLS or MLR. The data I work with are spectra – that is why I have extreme col-linearity – the dependent variable is typical a concentration. But I do find the approach described in this threat very interesting. Again thanks for your reply – I will try your method – may give me some inspiration.

Like

17. Georg says:

In order to compare the extent of collinearity of two vectors V1 and V2 to our everyday experience of Euclidian sapce, it might help to calculate the angle A12 between them according to the formula
``` A12=180/PI()*ARCCOS(SUMPRODUCT(V1,V2)/SQRT(SUMPRODUCT(V1,V1)*SUMPRODUCT(V2,V2))) ```
A result of 0 means totally collinear and 90 means totally independent.

Like

18. asafa says:

Hi,
I want to find out the trendline for a set of data (x,Y) but I want to get the trendline in the form of sin(x) and cos(x). However, using the excel it is only possible to have it in the form of linear, power, .. but not the sin or cos(x). How you think I can solve it?

Like

• Georg says:

Well asafa, if you really meant A*sin(x)+B*cos(x), you could use linest because your model is linear in the parameters. But I guess you want to solve something like A*cos(B*x)+C*cos(B*x) in order to determine a spectral component. As this model is non-linear in the parameters, you have to use a non-linear least squares method, for example, Excel’s solver. My experience of 20 years of NLLSQ fits lets me strongly recommend to use VBA to interface to an external DLL that allows for suppying the derivatives of the model function with respect to the parameters analytically. Doug has some posts on how to interface to Fortran. The procedure for interfacing to C is quite similar. There are free NLLSQ routines available for download on the web (NetLib, AlgLib,…). You could use the free CAS Maxima, e.g., in order to determine the analytical derivatives if your models become more complicated than just being the sum of a sine and a cosine.

Like

19. dougaj4 says:

for post on using the Excel Solver and the Alglib routines for non-linear regression in Excel.

I agree with Georg that for anyone doing serious work on this the purpose written routines such as those from Alglib offer much better performance than using Solver.

Like

20. Hi everbody, great site!!!. I am working in forcasting project, and its so hard to read in the chart the exactly number of the trend, I am looking for information, how to get the values of the linear, polynomial, exponential etc. lines but in values, for example my data is A1:A20 so I would like to see on B the linear values, C logarithmic, D polynomial etc. how can I do this, please any help welcome.

Like

21. dougaj4 says:

Baum – Have a look at the download spreadsheet. It contains the examples shown in the screenshots which returns the data you want. If anything isn’t clear, please ask.

Like

22. matew says:

anyone knows any video link to see exactly how to get the coefficients trend lines.

Like

23. Zack says:

Hi,
My name is Zack. I was wondering if there was any way to program an equation with a missing variable in excel, and have it calculate for the missing variable.
I.E.) X=Vo*t + 1/2*a*t^2 when knowing what x, Vo, and a are equal to, to try and find what “t” is.
or I.E.) V=Vo + a*t knowing Vo, a, and t to find “V”
If anyone could tell me how to do this with several other equations, I would greatly appreciate it.
my email is “zackgane@yahoo.com” thank you

Like

24. dougaj4 says:

Zack – for your first example you can use the Excel Goal-seek function (under the Data tab in 2007/2010). Also have a look at the Solver which gives more control and can solve more complex problems (more than one unknown for instance). There are also some posts here with UDFs to solve polynomial equations, which will do the job more quickly and conveniently than Goal-seek if you have a lot of them:
https://newtonexcelbach.wordpress.com/2010/08/04/solving-cubic-and-quartic-equations-with-excel/
https://newtonexcelbach.wordpress.com/2011/01/13/solving-higher-order-polynomials/

For the second example you can just enter the formula in a cell to find V, but maybe you meant to find t? In that case you can use the same methods as for the first example, but some simple algebra will give the result: t = (V-Vo)/a. There is a formula solution to the first example as well; look up quadratic formula.

Like

25. shiva says:

Hi
This is shiva
I have some points that should fit in a bell curve..
What i do to get the ordinates for any point.
Email. shiv_yers@hotmail.com

Like

26. Hawwa says:

hi, i’m a final year civil engineering student and i’m doing a spreadsheet for the design of steel components using microsoft excel.. does anyone know how to generate the bending moment and shear force diagrams in excel..? thanks 🙂

Like

27. dougaj4 says:

Hawwa – have you tried the Internet?

There are a stack of programs that will generate bending moments and shear forces, with different levels of complexity.

If you are looking for a continuous beam analysis you could start here:
https://newtonexcelbach.wordpress.com/2012/05/03/using-udfs-continuous-beam-example-3/

Like

28. Scott says:

Noob question and perhaps attributed to a superficialy understanding of arrays in excel (amongst other things) but for the life of me, I can’t understand why in the logarithmic example, the forumlas in cells O55 and P55 return different values but appear to be exactly the same. What subtle point am I missing?

Like

• dougaj4 says:

Scott – yes it is the use of array formulas that is confusing if you are not used to them.

The two cells contain the same formula, but it returns an array rather than a single cell, in this case the array is 1 row x 2 columns. The values are a and b in the formula
a(ln(x)) + b. You can see that the values match those of the trend line in the chart starting at C50.

To return both results of the array:
– Enter the formula in O55 in the usual way.
– Select O55:P55
– Press F2
– Press Ctrl-shift-enter
You should now get both results, and the formula will display in the edit line with {} around it, the same in both cells: =LINEST(Y_3,LN(X_3)).

You can also select O55:P55 to start with and enter with ctrl-shift-enter.

Like

• Scott says:

Thanks for the quick reply 🙂 Last question would then be whether its possible to actually have all of this in a single cell (i.e. a single formula that would return the value of y in a cell by calculating a(ln(x)) + b) or do I essentially have to calculate a and b in separate cells like in your file and then do another cell to calculate y?

Like

• dougaj4 says:

Scott – it’s possible but the formula would get quite long. You can use the Index function to return any value from an array, the same as if it was a range, so:
=INDEX(LINEST(Y_3,LN(X_3)),2) will return the b value.
you would end up with something like:
=LINEST(Y_3,LN(X_3)) * LN(x) + INDEX(LINEST(Y_3,LN(X_3)),2)

But personally I’d rather return the results of the array formula in two cells, and use a third cell to get the desired result.

Like

• John says:

Doug,

Thanks for your answer (on 4-26-13 @ 921am) to Scott’s question. I had the same question. After following your directions, I was able to recreate the array with the right constants. But, when I input an x value of 2, I get 1.14, when I think I should ave gotten 9.82. Can you help me understand what I am doing wrong? My formula was =O51*(LN(A37)+P51).

Best,

John

Like

• dougaj4 says:

John – have another look at the original post, starting at “In this case the process is not quite so straightforward …”

The problem is that Linest will only fit a straight line to the data, so if we want a non-linear fit we have to convert the data into a form that is a straight line, get the coefficients for that line, then convert it back to the non-linear form we want.
In the case of the exponential curve we fit a straight line to ln(y) = ax + b, so y = Exp(ax + b) = Exp(b) * Exp(ax)
For the example in the spreadsheet, Exp(b) is in cell Q51, so to return the y value for x = 2 (in cell A37) you need:
= Q51 * EXP(O51 * A37)
Note that this returns 10.411, rather than 9.82, because you are fitting a smooth curve to scattered data. Looking at the Exponential chart you can see that in this case the exponential curve is not a good fit to the data. The power curve gives a much better fit, but even in that case extrapolating past x = 11 would probably rapidly diverge.

Like

29. Anil says:

Thanks for the information. I want to ask one thing that in power series trend line, if the x axis is logarithmic will there be any effect on the values. Please reply.

Like

• dougaj4 says:

The easiest way to check is to try it and see, but the answer is no, making the scale logarithmic only changes the way the graph is plotted, it doesn’t change the trend line results.

Like

30. David Richard Edward Chopp says:

Your blog is very useful. As a new VBA user I’m having touble to extend this NL fit case a simple on variable case. Have tried bchanging both AL_NLFit and AL_NLFitText without success. I hit a wall that says : cannot change part of an array. I even tried tp create a new one with the insert function for yusing UDF but could not get it to work.

Like

• dougaj4 says:

If you still have problems after reading that you could send a sample file to dougaj4 at the usual google mail address.

Like

• David Richard Edward Chopp says:

I did it and it works!
Thank you very much!

Like

31. Lenn says:

Thanks for the information! I have a question, how do I calculate the b, c and d for the Cubic curve estimation example? When I use the formula =Linest(Y_1,X_1,^{1,2,3}) I get the number presented as a (-2,08199), but how do I calculate the numbers b, c and d?

Like

• dougaj4 says:

Lenn – sorry for the delay in replying.
You have to enter the Linest function as an array function to display all the results.
If you already have the function entered and displaying the first result then:
-Select that cell and the three adjacent cells to the right.
-Press F2 to enter Edit mode.
-Press Ctrl-shift-enter
The four results should display in the selected cells.

Like

• Lenn says:

Thanks for the explanation! That makes sense!

Like

32. AR says:

Thank you for the file.

Like

33. keilajedrik says:

Anyway I’d like to make the point that Excel _does_ have issues in this area!
(1) If you look at the R² of a trendline other than polynomials in a chart, it is NOT the same one you
get when you use the resp. worksheet function.

(2) If your y-data are negative, you cannot use trendlines, nor the equivalent worksheet functions,
when you want to model exponential / power / logarithmic approximations.
Even though the real world knows examples of exponential functions that have negative values …

(3) Whether you use trendlines in charts or the linest / growth worksheet functions: the coefficients
you get for exponential / power / logarithmic approximations of your data series will (in general)
NOT be best fit!

By the way, OpenOffice / LibreOffice seem to have similar issues in their CALC components …
Feel free to drop comments if you are interested in examples.

Like

34. jithn says:

Like

• dougaj4 says:

Like

35. Anurag Sinha says:

I would like to know how this linest function works for a data set where the prices for a future contract with different days to maturity is given.
For eg : 1302.4 for 21 days
1312.6 for 40 days
1326.8 for 60 days.
I need a price for 30 days to maturity using Linest funtion.

Like

• dougaj4 says:
36. aswathy says:

i would like to know for a second degree polynomial with 5 variables, how to convert it to a third degree polynomial

Like

• dougaj4 says:

Can you supply an example. It isn’t clear to me what you want to do.

Like

37. First of all: thanks for your great two extensive posts on this topic.

Second: I’m going nuts.

I have the following dataset:

1995 1982467844
1996 2267857601
1997 2404755862
1998 2432389525
1999 2444858940
2000 2635654453
2001 2705084500
2002 2995833686
2003 3473777485
2004 3951721285
2005 4619625038
2006 4775887626
2007 5442306488
2008 6312303423
2009 6046918247
2010 6606480677
2011 7278256840
2012 7543592253
2013 8029216606
2014 8148406215

which for all my purposes approximates well enough to a 5-th order polynomial. Indeed, the fitting polynomial given by the excel plot has the coefficients

m5 2.48985094033181E+04
m4 -2.49631251621755E+08
m3 1.00111213806761E+12
m2 -2.00740350104956E+15
m1 2.01258764717869E+18
m0 -8.07110669655566E+20

however, if I try to obtain an estimation of a 5-th order polynomial from the data using:

=TREND(Y_1;X_1^{1\2\3\4\5};X_1^{1\2\3\4\5})

(where I have defined the ranges Y_1 & X_1 accordingly, and where I have to use \ as in my system the comma is the decimal separator) I obtain a much less exact prediction.

If I use INDEX(LINEST(Y_1;X_1^{1\2\3\4\5});1;n) to check the coeficients I obtain are:

m5 -0.205863309
m4 1033.029828
m3 -1382303.147
m2 0
m1 0
m0 1117576429140730

which are very diffferent! If I reconstruct the polynomial with those coefficients I obtain the estimation calculated with TREND.

This got me curious (and nuts) and I started playing around with the degree of the polynomial compating always with the plot fitting. I found that TREND and LINEST are giving me the same solution of the plot ONLY UNTIL a polynomial of 3rd degree!!!! That makes no sense and is not accurate enough for my purposes.

I’ve check my options section, but found no setting that could be related to this. The only different thing to what I’ve seen in several posts is the regional configuration of my machine, but that would be odd. I’m running Excel 2013.

Any ideas? Thanks before hand!

Like

• Update: is not related to the regional configuration (of course). Today I checked chaging to using the dot as decimal separator, and the error persist.

Cheers
Francisco

Like

• dougaj4 says:

There are a couple of issues here:
1. The Linest function and the chart trend line do give different results sometimes. Linest tends to return coefficients of zero for one or more of the higher powers. It is usually OK up to 4th or 5th degree polynomials, but as you found, for some data it diverges even at the 4th degree.
2. Your x values, as year numbers, are large numbers over a small range, so when you raise them to the 5th power and multiply by a large coefficient you get a very large number, and when you subtract two large numbers with a small difference you get a big loss in precision.

If you replace the x range with 0 to 19 (i.e. number of years from 1995) you will find that Linest and the chart trend line give the same result, which also gives a good fit to the data.

You might also like to try the xlwSciPy spreadsheet, which links to the Python SciPy library (see https://newtonexcelbach.wordpress.com/2016/10/02/xlwscipy-1-09-update-for-xlwings-0-10-and-scipy-0-18-1/). This has an xl_PolyFit function to return the best fit polynomial coefficients, and xlPolyVal to evaluate a polynomial for any x. Both are on the SolvePoly sheet. You need to install Python to use these functions.

One other point you might like to look at is a difference in the R^2 value returned by the trend line and Linest. Both are correct, but use different definitions. See https://newtonexcelbach.wordpress.com/2009/11/03/chart-trend-lines-and-the-linest-function/

Finally, it is worth considering if the 5th degree polynomial is really more accurate, which really depends on what you intend to do with the numbers.

Like

• THANKS A LOT! of course it was that. I totally forgot my numerical methods principles. Obviously software has made me lazy XD
Now is corrected and I’m a happy user again. Have a great 2017! Greetings from Chile!

Like

38. Larry Schuster says:

I think the trendline on the graph lies. I’ve created an Excel wrapper regression function that uses variations of the Linest function (Exponential, Power, Linear, Log, and Polynomial up to order 7) which is very accurate. I can plot the known y-values against the function derived values and see which ones match the known curve closest. If I then plot a trendline using the function derived values, the polynomial equation doesn’t have the same coefficients as the Linest function calculates. Additionally, plotting the graphical trendline equation derived y-values doesn’t even match the trendline itself even though it may show a R-squared value near 1. I’d be happy to provide the function to anyone who wants it, or perhaps, Mr. Jenkins can host it here.

Like

• dougaj4 says:

It is true that Linest and chart trendlines can give different results, but sometimes they are both wrong (as described in the post). If you really need to fit higher order polynomials there is a much better function here: https://newtonexcelbach.wordpress.com/2011/02/04/fitting-high-order-polynomials/

Also the two can return different R values, both of which are right. See:
https://newtonexcelbach.wordpress.com/2009/11/03/chart-trend-lines-and-the-linest-function/

Also note that if you want to use the trend line results, the output should be formatted to display 15 significant figures (but it’s more convenient to use Linest and get the results as values directly anyway).

Like

• Larry Schuster says:

Thanks for the reply. Most of my original post comes from applying many of the ideas you referenced. The link is to a spreadsheet that shows my concern. It uses linest in vba to develop the regression coefficients and statistics. I then graph the known independent values with a trendline and equation. That trendline matches up almost exactly with those values despite the fact that the trendline equation isn’t the same as equation that produces the EQN results. I graphed with them is the known values, and they match up with them using that different equation. I then use a version of Ian Huitson’s trendY function that I modified into a subroutine to get all the values of the graph trendline into a column, and I graph them. The values from the trendline bear no resemblance to any of the other lines. Hopefully, you can explain why.

Like

• dougaj4 says:

Larry – I had a look at your results, but I don’t know how you got your plotted trend line values from the coefficients on the chart. I re-ran the “get trendline values” macro, and got a perfect fit to the graph. I also generated the values myself, using the chart trendline coefficients, and got the same results. Also I got exactly the same coefficients using my xl_polyfit function, so for those values both the chart trend line and Linest give the same good results with a 6th order polynomial.

By the way, your results in Column D link to a function on your hard disk, so display as #NAME?

Like

• Larry Schuster says:

Sorry about that link. It’s fixed if you want to redownload. I surely can’t figure out why the “Get trendlines values” macro works for you and not for me. Are you using a 64 bit Excel?

Like

• dougaj4 says:

Also, for those using Python and xlwings, the xlSciPy spreadsheet:
https://newtonexcelbach.wordpress.com/2016/10/02/xlwscipy-1-09-update-for-xlwings-0-10-and-scipy-0-18-1/

has an xl_Polyfit function, as well as general purpose fitting functions.

Like

• dougaj4 says:

I’ll have another look. I’m using 32 bit Excel with 64 bit Windows 10.

I’ll have a look through the code and see if I can find anything that might be throwing out the results.

Like

• dougaj4 says:

I think I have found the problem:
The chart in your spreadsheet is a “line” chart, which doesn’t have x range values. The specified x range is treated as labels, even if they are numbers.
If you create a trend line, it treats the x values as being 1,2,3,…
You need to change the chart type to “scatter (XY)”, then the chart trend line will return the same results as Linest.
Also for the TrendY macro to work you should switch off the display of the R2 value (or modify the code so it ignores the R2).

I’m surprised that line charts will display a trend line. It seems like asking for trouble to me.

Like

• Larry Schuster says:

Yup, X-Y plot did the trick, everything works, and FWIW, the TrendY macro has a check for the R-sqr display. If its there, it turns it off, parses the trendline equation, and then turns it back on. Thanks for your help.

Like

• Larry Schuster says:

And another FWIW, I added drawing the trendline equation and R-sqr on the chart when you make a selection in the dropdown box.

Like

• dougaj4 says:

There is a discussion of this very problem at:
http://www.excelforum.com/excel-charting-and-pivots/537215-linear-trendline-wrong-equation.html

The conclusion is the same: if you are plotting XY data, then use an XY chart (otherwise known as a Scatter chart, even though you can connect the points with a line).

Note that a line chart will only display XY data correctly if the X range is equally spaced increasing points, and it will only display correct trend line data if the X range is a consecutive series of integers, starting at 1.

Like

39. Frankmaybe says:

For “Power” equations, when I type =LINEST(LN(Y_2),LN(X_2)) in the example spreadsheet, I can only get the power factor (b) and not the constant term (a). BTW, entering it as an array doesn’t change anything. I know I need to use =EXP(P53) but P53 is the figure I cannot reproduce.

Like

• dougaj4 says:

When you enter as an array formula, are you selecting both cells (O53 and P53) first?
– Select both cells
– Press F2
– Press ctrl-shift-enter

If that doesn’t work, please send your spreadsheet to dougaj4 at gmail, and I will have a look.

Like

• Frankmaybe says:

That worked! I don’t know why, but I guess I don’t care. Thank you so much.

Like

40. Julia says:

Hello,

I am very new to all of this and am trying to figure it out for a paper I am working on. I am trying to identify irregularities on a tooth surface, so I have a profile that effectively charts the z-coordinates (as the x-axis) by the y-coordinates (distance across the crown). I am trying to work out a method for identifying where the profile fluctuates too much from where it ‘should’ be – so where it is too depressed. A paper that was recently published identified defects as those where the tooth surface “deviates negatively by more than one moving standard deviation from a fitted sixth order polynomial along the shape of the tooth.” So I was trying to figure out using polynomial regression for my own analysis, and while I was trying to figure out how to take the linest-generated regression and use it as data points to calculate residual at each point, I came across this blog. Long explanation, but apart from still being a bit stumped on how to take the linest-generated information and use it for the further steps, I’m now concerned that polynomial regression might be a problem here. The paper I used specified a 6th order polynomial regression, and my data points are very large – for the y-coordinates into 5000 (um) and for the x-coordinates into the 100’s. Are you able to confirm for me whether this would, indeed, be a problem? If so, do you have any suggestions?

Thank you,

(Paper Reference: Henriquez and Oxenham. 2007. An alternative objective microscopic method for the identification of linear enamel hypoplasia (LEH) in the absence of visible perikymata. Journal of Archaeological Sciences: Reports 14: 76-84).

Liked by 1 person

• dougaj4 says:

I’m not an expert in statistical techniques, but I would be cautious about using a polynomial for this application. The main problem is that at a sharp change of direction polynomials tend to oscillate, rather than following the actual curve. Also the Excel Linest function becomes unreliable at 5th or 6th order. There are some functions here that perform much better for high order polynomials, that might be better for your application. Search for alglib and scipy. Both require installation of external libraries, but both are free. If you do decide to have a look at them, let me know if you have any problems with installation or running them.

Liked by 1 person

• Julia says:

Thank you very much. I’ll work on figuring out another way. I’m a bit confused, to be honest, by both alglib and skipy – they both need programming language, don’t they? A bit beyond me at the moment but I’ll continue to explore.

Like

• dougaj4 says:

The spreadsheets are actually set up so you can use the Alglib and Scipy functions without any programming, in the same way as the built-in Excel functions. The documentation in the spreadsheets is very brief though, so for advanced functions you would need to refer to the documentation for the base libraries, which is comprehensive, but not an easy read. If you have colleagues and/or tutors that can help with the maths it would probably be best to use whatever they are familiar with. If you do decide to use one of my spreadsheets feel free to ask for help though. I would be happy to set up an example using your data, if that would help.

Like

41. Joe says:

Hi, I have data with missing values and I am trying to calculate coefficients for all trend lines (i.e. linear, 2nd order poly, 3rd order poly, power, log & exp).
The examples above for second order poly are great thanks and I have used these to also get the 3rd order polys. However, I am stumped about how to now calculate for the power, log and exponential trendlines.
Any help would be great please!
Thanks, Joe

Like

• dougaj4 says:

I was going to say that the examples on the first sheet will also work with the Linestgap function, but they don’t because the input needs to be two ranges, and it wont accept LOG(range), or EXP(range).
You need to set up a third range with the Log or Exp of the data, and this needs to check for blank cells, so for instance for the first example enter:
=IFERROR(LOG(B36),””)
, say in cell N61, and copy that down to N71.
You can then use that range with linestgap:
=LINESTgap(N61:N71,X_2)
which will work for data with gaps.