Fitting high order polynomials

This post is a follow up to Using LINEST for non-linear curve fitting and the following comments from Scott Rogers and Lori Miller.  Scott found that he was getting different results from Linest and the xy chart trend line for polynomials of order 5 and 6 (6th order being the highest that can be displayed with the trend line).

In order to investigate this I have looked at fitting polynomials of different degree to the function y = 1/(x – 4.99) over the range x = 5 to x = 6.  It should be emphasised that high order polynomials are completely inappropriate for interpolating a function such as this; it was chosen purely because it shows up the differences in the results from the four different methods examined.  These were:

  1. Using the LINEST function: =LINEST(YRange, XRange^{1,2, … ,n}
  2. Using the xy chart trend line for polynomials up to 6th order
  3. Using the function provided by Lori Miller in the comments: =MMULT(MMULT(TRANSPOSE(Y),X^N),MINVERSE(MMULT(TRANSPOSE(X^N),X^N)))
  4. Using the ALGLIB “PolynomialFit” fit routine.  This is specifically intended to fit a polynomial to scattered data, using a least squares method.  I have written an Excel VBA User Defined Function (UDF) to call the ALGLIB function directly from the spreadsheet.  I will post more details of this, including a download file with open source code, in a future post.

The screenshots below show the following results for polynomials fitted to the test function:

  • Plots of the base data for 101 points between x = 5 and x = 6, and the four generated polynomial curves.
  • The polynomial coefficients for the Linest curve, the chart trend line (displayed on the charts), and the matrix function.
  • Unfortunately the current VBA version of the ALGLIB routine does not return the polynomial coefficients (this feature should be available in a few months), so I have evaluated the Linest and the Alglib functions at each of the base data x values, and found the maximum absolute difference between the two.


For 4th order polynomials:

4th order polynomial, click for full size view

the Linest, chart trend line and ALGLIB results are essentially identical, but the matrix function is already showing significant differences.

For 5th order polynomials:

5th order polynomials

the Linest, chart trend line, and ALGLIB results are still in good agreement, but the matrix function results are now completely different.

For 6th order:

6th order polynomials

The Linest and chart trend line results are now completely different, with Linest having returned a coefficient of 0 for the x term.  The ALGLIB results appear visually identical to the chart trend line.

For the 7th order only the Linest and ALGLIB results are plotted:

7th order polynomials

The Linest line has retained a form similar to the 5th order results (with zero coefficients for the x squared and x terms), but the ALGLIB line is consistent with a 7th order polynomial.

It can be seen that in all cases the polynomial lines oscillate above and below the data, which is a feature of fitting high order polynomials to a monotonic function.  To check if the behaviour of the Linest output was a result of fitting a polynomial function to inappropriate data the same exercise was carried out on a cyclic function:

y = =SIN((X-4)*A)/(X-4.99), with A set to 8pi, so that the resulting function had 4 complete cycles over the range from x = 5 to x = 6.

The results for fourth order to seventh order polynomials are shown in the four screen shots below:

Cyclic function, fourth order polynomials

Cyclic function; fith order polynomials

Cyclic function; 6th order polynomials

Cyclic function; 7th order polynomials

The results were very similar to those for the monotonic function, with the Linest results being different from the chart trend line at the 6th order, and with the x squared and x coefficients from Linest being zero for the 7th order line.

Two further lines were plotted with 10th order polynomials (Linest and ALGLIB) and 15th order (ALGLIB only), to check if a reasonably close fit to the data could be found:

Cyclic function, 10th order polynomials

Cyclic function, 15th order polynomial

It can be seen that the 10th order Linest line has maintained the form of the 5th order polynomial, with 5 of the 11 coefficients being set to zero.  The ALGLIB results appear to be appropriate to the order of the polynomial curve, and a good fit has been achieved to the data with a 15th order polynomial.

In conclusion:

  • The Excel Linest function and polynomial chart trendline produce different results for 6th order polynomials in the cases examined.  As noted by Lori Miller in the comments to the previous Linest post, this is probably because of changes made to the algorithm for dealing with co-linear data.
  • The matrix function (at least in this case) did not give good results beyond fourth order.
  • For most interpolation purposes use of a cubic spline will normally give better results than a high order polynomial.
  • For cases where a high order polynomial is appropriate the ALGLIB PolynomialFit routine appeared to give much better results than Linest.  In separate tests PolynomialFit was found to be stable up to at least 50th order, with the data presented here.
This entry was posted in AlgLib, Charts, Excel, Maths, Newton, UDFs, VBA and tagged , , , , , , , , . Bookmark the permalink.

31 Responses to Fitting high order polynomials

  1. Hui.. says:

    Doug,
    Nice Post

    Most errors from trendlines come from the accuracy of the displayed trendline equation and the fact that it will contain rounding errors, whereas the trendline itself is calculated and displayed using full precision. This can be fixed to some extend by increasing the level of precision of the equation in the trendline box up to 15 decimals.

    You should note that some versions of Excel 2007 contained errors in the equation which was displayed in the Trendline equation box for Polynomial trends greater than 3rd order.
    This was fixed by a hotfix http://support.microsoft.com/kb/938541

    In 2010 introduced several other improvements in the statistics functions http://blogs.office.com/b/microsoft-excel/archive/2009/09/10/function-improvements-in-excel-2010.aspx

    Like

  2. Jon says:

    Hui –

    That hotfix was dated June 2007, so it must have been included in SP1, or at least in SP2, which is the most recent.

    There was another early problem in Excel 2007, I think with LINEST, in which Excel would set to zero any coefficients which were on the order of “rounding error” due to the binary to decimal conversion.

    Doug –

    What I found most striking about this post was not how Excel’s poly coefficients differed from any others, but rather how poorly the polynomials fit the data. I cringe when I see poly fits higher than order 2 on charts.

    Like

  3. dougaj4 says:

    Hi Hui and Jon – thanks for the comments.

    I probably should have put the bit about ” high order polynomials are completely inappropriate for interpolating a function such as this” in bold red huge font all caps 🙂

    That said, the last chart (using the ALGLIB routine) shows a good fit to the data, and I’m sure there are occaisions where this would be an appropriate use of a high order polynomial fit, the only provisos being: not with the built-in Excel functions, and not unless you know what you are doing (or at least check that the answers make sense).

    Like

  4. Lori Miller says:

    A simple way to improve accuracy for higher powers is to substitute x by (x-Average(x)) in the linest or trend formula. For 100 equally spaced intervals, linest gives non-zero coefficients up to 20th degree for me in Excel 2010 but it looks like there were some recent improvements and another fix for this is needed in previous versions (kb/976468). Polynomial coefficients can be found by post-multiplying the linest values by the matrix of binomial terms:
    =IF(TRANSPOSE(N)>=N,COMBIN(TRANSPOSE(N),N)*(-AVERAGE(N))^(TRANSPOSE(N)-N),0)

    It’s also worth comparing with the Fourier-Legendre series coefficients again using a mean-centred scaling. These are the theoretical values from minimising the squared area between the function and the polynomial approximation (i.e. L-2 Norm). I only managed to compute the first few values but found them pretty close – perhaps a good application of the tanh-sinh method. The Alglib algorithm seems to be applying a minimax approximation, i haven’t come across this before but it’s worth following up terminology as it leads to some interesting topics in numerical analysis.

    Like

  5. Pingback: ALGLIB linear and polynomial fitting functions | Newton Excel Bach, not (just) an Excel Blog

  6. Lori Miller says:

    Oops just noticed a typo in the matrix formula above, it should be AVERAGE(X), not AVERAGE(N). Naming the formula above z, and entering:
    =MMULT(LINEST(Y,(X-AVERAGE(X))^{1,2,3,4,5,6}),z)

    returns near identical results to the XNumbers function xRegPolyCoef. Comparing the chart trendline values shows up small inaccuracies in the displayed equation giving roughly 6 or 7 digits of accuracy in values. The results also matched against Alglib for the 15th degree polynomial example using:
    =TREND(Y,(X-AVERAGE(X))^COLUMN(A:O))

    So regression functions are finally more robust in Excel 2010 and extend beyond 16th degree, though curiously this wasn’t mentioned in the Excel blog post referred to in Hui’s link.

    Like

  7. Georg says:

    Hi all, I should have been more explicit when suggesting to use Legendre’s polynomials in the previous thread, sorry… If the x-data are evenly spaced, subtracting the mean removes the correlation between x and x^2 as well as that between x^2 and x^3 but not that between x and x^3, for example. Even using Legendre’s polynomials with the usual coefficients in this case does not yield exactly orthogonal basis functions. In order to establish an exactly orthogonal set, the rarely mentioned “discrete Legendre polynomials” have to be used. The 1 and x polynomials remain unchanged, but in case of N evenly spaced x-values (transformed onto the interval [-1;1] !) the second and third discrete Legendre polynomials read:
    a_2*x^2 + (1-a_2) , a_2 = 3/2*(N-1)/(N-2) and
    x*(a_3*x^2 + (1-a_3)) , a_3 = 5/2*(N-1)^2/((N-3)*(N-2)) .
    In the limit N->inf the coefficients approach the usual values that are valid for continuous x.
    The recurrence relation that yields all the higher order discrete Legendre polynomials can be found on page 13 of Kelly Johnson’s M.Sc. thesis that can be downloaded from the web:
    http://esr.lib.ttu.edu/bitstream/handle/2346/17982/31295017084426.pdf?sequence=1
    I had used the free CAS Maxima to determine the a_2 and a_3 coefficients before finding that resource. Therein, N+1 evenly spaced values are assumed, so that the formulae are a bit different from those given before.
    I think that due to Gibb’s phenomenon, people rarely use high order polynomials but prefer piecewise fitted cubic splines.

    Like

  8. Pingback: Daily Download 14: Curve Fitting 1 | Newton Excel Bach, not (just) an Excel Blog

  9. Richard says:

    The error in the Trendline function for polynomial equations of cubic and above power still exists in Excel 2010.

    Like

    • dougaj4 says:

      The results above were using Excel 2010, or did you mean 2013?

      Also I haven’t seen problems with a cubic. Could you give an example?

      Like

  10. Georg says:

    Hi Doug,
    I’ve just obtained the following two examples using fully patched XL14x64 on Win7Profx64, Proc is i5-650.
    x = 1,2,…,9,10; ci := ((-1)^i)*i for i = 1,2,…,6, and c0 := 9
    1st example: differences between trendline and linest
    y6 := (c6*x+c5)*x+c4)*x+c3)*x+c2)*x+c1)*x+c0

    trendlineLINEST (EN) == RGP (DE)
    +6.00000000745058E+00+6.00000000000022E+00
    -5.00000064074993E+00-5.00000000000730E+00
    +4.00000786036252E+00+4.00000000009572E+00
    -3.00004807114601E+00-3.00000000063311E+00
    +2.00071369856595E+00+2.00000000221565E+00
    -1.00939099490642E+00-1.00000000386880E+00
    +9.02397645264863E+00+9.00000000262104E+00

    Obviously, the accuracy of the parameters of a trendline in case of a high-order polynomial generally is pathetic. Remember that the highest-order x values only need 30 bits at maximum. The parameters returned by LINEST are much more accurate, but of course suffer from cancellation and intermediate squaring.
    ———————————————
    2nd example: even a parabola is not exactly recovered
    y2 := (c2*x+c1)*x+c0

    LINEST - Rsq +1 (exactly)

    LINEST - C2 +2.00000000000000E+00
    LINEST - SD +1.51E-16
    LINEST - ER -4.44E-16

    LINEST - c1 -9.99999999999996E-01
    LINEST - SD +1.70E-15
    LINEST - ER +3.77E-15

    LINEST - c0 +8.99999999999999E+00
    LINEST - SD +4.08E-15
    LINEST - ER -1.07E-14

    Please notice the contradiction between Rsq=1 and the differences between the estimated standard deviations (SD) and the real errors (ER), which are all much larger than the SDs… this is not funny…
    We do not use XL for serious calculations any more. On the other hand, it has turned out to be a real burden to get accurate results on the SSE FPU when the code was developed for the x87 FPU.
    Unfortunately, a phenomenon called “double rounding” prevents fast multi-precision code that is based on the expansion concept from being usable in VBA and XL (at present, I do not understand why this problem occurs in XL, because it is said to use the SSE FPU, in which double rounding cannot occur. So some parts of XL seem still to use the x87 FPU). Only the relatively slow multiple-digits concept does work in VBA, as can be seen from xNumbers. I will have some time for looking at XL after Feb 19th…

    Like

  11. Dragos says:

    Excellent work !!!! Nice job! Congrats!
    One question: where can I find(module and var name) the coeficients of the created polynome ?
    Thanks

    Like

    • dougaj4 says:

      Dragos – The Alglib routine used in the spreadsheet does not return the polynomial coefficients. It uses the barycentric form, and passes these coefficients to the routine BarycentricCalc, which is in the mAL_ratint module. There must be a reasonably simple way to get the coefficients from the barycentric form, but I don’t know what it is. It may be included in the latest Alglib release, but this is not available in a VBA version.

      One thing you could do is find the coordinates for a sufficient number of points to define the function, then use one of the functions in http://interactiveds.com.au/software/Polynomial.zip to solve the polynomial. Not a very efficient way to do it, but it should work.

      Like

      • Dragos says:

        I see. Thank you for your qiuck reply!
        I’ll study further more the BarycentricCalc routine.
        In http://interactiveds.com.au/software/Polynomial.zip is only 4th degree poly, so I saw… corect me if I’m wrong!
        I’m useing 50 to 60 degree poly for telecom, and thus, I need to get to coef of the poly to make a prediction of the graph at the n+1 plot (ex: for a poly with X from 1 to 50, I calculate the same poly at the 51st point, eventhou the 51st point haven’t happen yet 😉 )
        But your subroutine doesn’t alow me to input the X+1 param, because there is no Y+1.
        But your routines are so fast, and very exact! I need somehow to calculate an Y for the X+1 point. The only way is to get the coef and calculate the poly manualy (VBA func). Or do you have another ideea? This would be very helpfull!

        Another question: can I access AL_FitPoly from within the VBA cone with some arrays as input param (I don’t want to use Cells from the worksheet)?

        Another question: If I had the coef of the poly, which is the name of the procedure to calculate the poly (I’ve tryed to write one, but I’m getting strange Y – I don’t know why … 😦 )

        Thank you very much!

        Like

  12. dougaj4 says:

    Dragos – The Polynomial spreadsheet has a function that will solve polynomials of any order using an iterative method (RPolyJT), see the Jenk-Traub sheet.

    But I don’t think you need it! If you just want to find the Y value at X values other than those in the input data you just need to specify a column range with the output X values listed, as the last function argument (XIA). In the download spreadsheet the output X values are listed in Column C (Headed XIA). You can enter your own values, and the function will return the corresponding Y value. You should however note that a high order polynomial can diverge very quickly, and may not be a good choice for performing extrapolation.

    You can call AL_FitPoly from another VBA routine. The multi-cell arguments can either be passed as range objects or variant arrays.

    To calculate the value of a polynomial with known coefficients the Polynomial spreadsheet includes the function EvalPolyHC. A simple evaluation of a high order polynomial will often give inaccurate results because you are subtracting very large numbers. The EvalPolyHC function uses an algorithm that avoids this problem, and should give accurate results.

    Like

    • Dragos says:

      Thanks. I found out how to call the func in VBA! Works excellent!
      I’ve also checked with XIA. Works fine; I didn’t get it the first time I saw the code, but now after your reply, I got it! Thanks 😉
      I will check also EvalPolyHC …
      One more Q: Does the algorithm written in C# or VB.net have the possibility to return the actual coeff ? If yes, then it’s solved, I can link the dll to the VBA prj, and use the functions from C#!
      Thank you very much !

      Like

  13. Lori Miller says:

    Dragos, in addition to the remarks from Doug, here are a few extra points (which i hope come out ok):

    1. For converting barycentric to polynomial coefficients, you can post-multiply by the matrix of binomial terms, as mentioned above in relation to LINEST. Suppose that:
    b is the (n x 1)-vector of barycentric coefficients
    x is the (1 x k)-vector of observations,
    N = {n-1,…,1,0} is the (n x 1)-vector of indices
    then the polynomial coefficients are given by:
    =mmult(b,iferror(combin(transpose(N),N)*(-average(x))^(transpose(N)-N),0))
    in the case average(x)=0, the result is simply b.
    As a check note that for given x and y values:
    =mmult(linest(y,(x-average(x))^{1,2}),iferror(combin({2;1;0},{2,1,0})*(-average(x))^({2;1;0}-{2,1,0}),0))
    equals linest(y,x^{1,2}) but mitigates collinearity problems.
    [Cancellation issues are unlikely to occur in such a sum but if you are worried you can use Horner evaluation – which i think might be covered by another blog post.]

    2. For y-predictions based on a polynomial in x at a given value z you can use:
    =trend(y,(x-average(x))^transpose(row(1:16)),(z-average(x))^transpose(row(1:16)))
    In Excel 2010 regression functions can be extended beyond 16th power and were further improved.

    3. See also the follow up post: https://newtonexcelbach.wordpress.com/2011/02/17/alglib-linear-and-polynomial-fitting-functions/. You can use Application.Run to call these functions from another workbook in VBA.

    Like

    • dougaj4 says:

      Hi Lori – thanks for that; looks like it should be quite straightforward to incorporate that in the UDF. I’ll also be looking at updating to the latest Alglib version, when I find some time.

      Like

      • Lori Miller says:

        Doug – yes, update to the later version of Alglib! i was taking barycentric to just mean “centered” (as here), but Alglib interprets it in terms of rational functions. From the article on polynomial interpolation the benefits of this approach make sense but i’m a little skeptical as to the difference it makes for least squares fitting where SVD is being applied as well – but who am i to say.

        Like

    • Dragos says:

      Lori, thanks for you input !
      I’m having trouble with your solution with MMULT ! 😦

      Q1: value Z? I have an array of X=1 to 50, Y (for the plots). Tell me, without any predictions, after using AL_FitPolyCW, what is the formula to get the coeff with MMULT?
      Q2: excel beyond 16th power – no success ! 😦
      Code:
      Dim coef_temp() As Variant
      With Application
      coef_temp = .LinEst(y, .Power(X, .Transpose(Array(16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))), False, False)
      End With
      ‘ X & Y are variant array with the data

      If I’m increasing beyond 16th power, I’m getting 0’s for many coeff in Office 2010 !!!
      Do you have any tips/solutions ?

      Thank you so much !

      Like

      • Lori Miller says:

        Q1) The AlgLib function polynomialbar2pow, available in other languages, could be called to convert the variable p that is generated by Al_FitPoly. However the notes say that this is not to be relied on for powers greater than around eight even after normalisation and so you may be better off calculating directly using Linest. If you’re determined to find coefficients for a 50th degree polynomial (even after reading other advice) you’ll need to use extended precision calculations like is available in Xnumbers.

        Q2) Take a look at the second example in this post which is available in the AL-Spline-Matrix download, and transform data to the interval [-1,1] using z=2x-11. Comparing results, the coefficients of Linest and Al_Linest begin to diverge from Xnumbers above 15th degree polynomial in z. Trend matches Al_FitPoly up to 20th degree but drops coefficients after that in my version. Using Xnumbers functions to find the coefficients and sum the series for 50th degree and 50 digits of precision gives a near exact fit (RSS=1E-12) and identical results to Al_FitPoly.

        Like

  14. Lori says:

    Since this post continues to appear on the list of active posts, i’d just like to add that a similar piece of analysis recently showed that switching to Chebyshev Polynomials does indeed give significant improvements comparable in accuracy to AlgLib. Further, it is still possible to use sheet functions for this.

    The nth Chebyshev Polynomial (of the first kind) is defined by T[n](x) = cos(n*acos(x)) (|x|<1). A formula for fitted y values for a 50th degree polynomial in x (shifted to the interval [-1,1]) is:

    =TREND(y,COS(TRANSPOSE(ROW(1:50))*ACOS(2*(x-MIN(x))/(MAX(x)-MIN(x))-1)))

    Changing TREND to LINEST returns the Chebyshev coefficients which are all non-zero for 100 equally spaced points – as in the above examples. To find other values at x=z, repeat the second argument in the third argument and change the first x to z .

    Like

  15. dougaj4 says:

    Hi Lori, thanks for that.
    I’ll be looking at linking to Python curve fitting routines when I have time, including the Python version of Alglib, and I’ll have a closer look at your functions then. Do you have a spreadsheet with some examples?

    Also, for future reference, a link to a discussion on the Alglib Forum about using Alglib/Python with Numpy arrays: http://forum.alglib.net/viewtopic.php?f=2&t=550

    Like

    • Lori says:

      Here’s a file based on an example from the post with adjustable parameters: Polynomial Trend (this link may be available for a short time so maybe this could be moved to your downloads page). Tests in this SO post suggest the cosine formula is equal if not more efficient than the recursive approach.

      I’d be interested to see your next polynomial post and will have a play with scipy and linking to alglib. From my limited experience with it, i’ve also liked the functional aspects of python, provided by maps, which allow for clean compact code when implementing stuff like this.

      Like

  16. seb says:

    Hi,

    I am trying to find the coef of a 10th order polynomial curve. Everything works fine until I try to get the coefficient. Whatever I do, I get the damn #value? error. I am using the function posted by Lori =MMULT(MMULT(TRANSPOSE(Y),X^N),MINVERSE(MMULT(TRANSPOSE(X^N),X^N))) which for me is: =MMULT(MMULT(TRANSPOSE(D3:D18);C3:C18^{1;2;3;4;5;6;7;8;9;10});MINVERSE(MMULT(TRANSPOSE(D3:D18^{1;2;3;4;5;6;7;8;9;10});C3:C18^{1;2;3;4;5;6;7;8;9;10}))).

    Like

  17. This quintic equation solves for r=distance from Moon to the Lagrangian Point 1, where Me=Earth mass, Mm=Moon mass, and R=radius of Moon’s orbit.
    (Me/(R+r)^2)+(Mm/r^2)=(Me/R^2)+r(Me+Mm)/R^3
    Where Me is very much larger than Mm (as here), the equation reduces to an approximate answer of:
    r=R*(Mm/3*Me)^1/3
    I wish to solve for any values of Me and Mm. Is there an Excel function or method that will accomplish this?

    Like

  18. Pingback: Solving the Lagrangian Point equation for the Moon | Newton Excel Bach, not (just) an Excel Blog

  19. Pingback: LinEstGap with non-linear functions | Newton Excel Bach, not (just) an Excel Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.