#### Polynomial Approximation (Line-Fitting): An Extremely Simple Method

by

, 22nd March 2014 at 14:19 (24206 Views)

We often plot some data on a graph and wonder if there might be a simple mathematical function to describe what we see:

In many cases, apolynomial approximationwould be particularly useful. That is, we want to describe the line in the form:

for some maximum powerN(called theorderof the polynomial). There is an unbelievably simple trick for doing this, which for the above graph tells us that the data is well approximated by the following 3rd order polynomial:

We can confirm the accuracy of this approximation by plotting it with the original data:

I will now describe how to do this. I will give specific details (andsource code) for a Matlab implementation, but will also describe the theory so you can implement the method in any way you choose.

The Method

This method is so simple, it can be done using just 2 lines of Matlab code:

Code Matlab M - [expand] 1 2 C = bsxfun(@power, (0:L).', (0:N)); a = pinv(C)*y;

The first line simply creates a (L+1 x N+1) matrix, C, whose (n,p)th element is n^{p}. The second line forms the Moore-Penrose pseudoinverse of C and multiplies this with the input data vector, y.

Why does this work? Well, if we insert x = 0, 1, 2, ..., L into Equation (1), then we are saying that the data values will be approximated by:

We can rewrite this as a matrix equation:

or, defining a variable for each of the bracketed terms, simply:

So, all we have to do now is find the vectorawhich best fits this equation (allowing us to then simply plug a_{0}, a_{1}, a_{2}, ..., a_{N}back into Equation (1)). I say "best fits" because an exact solution may not exist. For L>N, this is anoverdeterminedsystem of equations, so we may have to settle for an approximate solution. The most popular approximate solution I know of is the "minimum norm" solution, which is given by:

whereC^{+}denotes the Moore-Penrose pseudoinverse ofC. And... that's all there is to it! We can check our approximation by substitutinga_{approx}back into Equation (3):

and plotting this againsty, as I did above.

Final Comments

There are two issues that should be mentioned:

- Overfitting
- Numerical accuracy

If polynomial order, N, is set too high, overfitting may occur. This means that obscure higher-order terms may appear in the solution which only act to describe noise (rather than the true underlying process). Always try lower values of N first.

Numerical rounding errors can become an issue, particularly for larger N. Again, stick to lower N where possible. I have also heard that a QR decomposition of C will often yield a slightly more numerically-accurate solution fora. In this case, the Matlab code for computinga_{approx}would be:

This solution can then be refined as follows:

(and this refinement can be repeated iteratively).

Code Matlab M - [expand] 1 2 3 r = y - C*a; err = R\(R'\(C'*r)); a = a + err;

Matlab Code

Full Matlab code:

poly_approx.zip

Simplified code with minimal comments:

Code Matlab M - [expand] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 % Maximum order of polynomial N = 3; % Generate some input data L = 24; % maximum x value (starting from 0) x = (0:L).'; y = (1/3)*x.^3 - 10*x.^2 + 75*x - 33; y = y + 10*randn(L+1,1); % add some random noise to make it less obvious % Calculate a (approx) C = bsxfun(@power, (0:L).', (0:N)); a = pinv(C)*y; % Check our approximation of y y_approx = C*a; % Plot results plot(x,y); grid on; hold on; plot(x,y_approx,'--r'); xlabel('x'); ylabel('y'); legend('Input data', 'Polynomial Approximation');