Continue to Site

# A question for the math experts

Status
Not open for further replies.

#### suiram

##### Junior Member level 3
Hi all,
I have to evaluate a function that contains termens like x^0.8+y^3.6
Since I have to execute this a few hundred times per second on a relatively slow uC, I can't afford to use the standard pow from the C library because it's slow.
Isn't any way to decompose x^0.8 for example into an expression faster to evaluate that using the standard pow algorithm?
I would apreciate any advices or pointers.
Thanks

Hi suiram,

I advice to use polynomial approximations: you need only multiplications and additions. For finding the coefficients of the approximationg polynomial, you need to specify the range of interest and a error criterium to minimize (least-squares, maximum absolute-value, integral absolute-value,). The polynomial order depends of the allowed error.

You can use several methods for calculating the polynomial. One of them is the funcion POLYFIT of matlab (uses least squares). For example, with this function I found this approximation for z=y^3.6 using a 3rd order poly in the range [0,1]:
zaprox=(1.62257972218847)*y^3 + (-0.78204046757632)*y^2 + 0.16022572661344*y + (-0.00767901243029)

Once you have the coefficients, you have different forms to implement the program: one possibility is

y2 =y*y;
y3=y2*y;
z=b3*y3+b2*y2+b1*y+b0;

But another one is:

z= (((y+a0)*y+a1)*y+a2)*a3;

It depends of your uC and how you code (C or assembly, fixed or floating point) which one is more efficient.

Regards

Z

" Once you have the coefficients, you have different forms to implement the program: one possibility is

y2 =y*y;
y3=y2*y;
z=b3*y3+b2*y2+b1*y+b0;

But another one is:

z= (((y+a0)*y+a1)*y+a2)*a3;

It depends of your uC and how you code (C or assembly, fixed or floating point) which one is more efficient. "

I think second equaition fast works and short, becouse formula need only 3 multiplification and 3 addition.

Just a question. What degree of accuracy does the approximation need to have and what are the expected ragne in the varables x & y?

pow

The idea with polyfit sounds great.
Unfortunatelly I don't know the range.

To be more specific here is the code that I try to optimize:

static float EvaluateC_PF(DEB_PARAM*deb_param,float RD,BOOL first_call,void* cache_buf)
{
float C;
float D,L1,L2,A,M;
float tmp1,tmp2,tmp3;
float* cache=(float*)cache_buf;
A = pow((19000.0*deb_param->beta)/RD,0.8);
if(first_call)
{
D=deb_param->Dt*1000.0;
L1=L2=25.4/D;
M=2*L2/(1-deb_param->beta);
tmp1=(0.043+0.080*exp(-10.0*L1)-0.123*exp(-7.0*L1))*(deb_param->beta4/deb_param->beta1m4)*(1-0.11*A);
tmp1-=0.031*(M-0.8*pow(M,1.1))*deb_param->beta13;
tmp2=0.5961+0.0261*deb_param->beta2-0.216*deb_param->beta8;
tmp3=0.11*(0.75-deb_param->beta)*(2.8-D/25.4);
cache[0]=D;
cache[1]=L1;
cache[2]=L2;
cache[3]=M;
cache[4]=tmp1;
cache[5]=tmp2;
cache[6]=tmp3;
}
else
{
D=cache[0];
L1=cache[1];
L2=cache[2];
M=cache[3];
tmp1=cache[4];
tmp2=cache[5];
tmp3=cache[6];
}
RD=1e6/RD;
C=tmp2+0.000521*pow(RD*deb_param->beta,0.7);
C+=(0.0188+0.0063*A)*deb_param->beta35*pow(RD,0.3);
C+=tmp1;
if(D<71.12)
C+=tmp3;
return C;
}
It's very hard to know the range of parameters in such equation.

Hi suiram,

You can proceed in this way for covering all the needed range:
Suppose you want to calculate y=x^A (for x positive).
Let x=M*2^E with 0.5<=M<1 (M and E are the floating-point mantissa and exponent of x).
Then y = (M*2^E)^A = M^A * (2^A)^E.
As 0.5<=M<1, you can calculate by polynomial approximation the function M^A valid in the range [0.5, 1) in the way I described above, and multiply the result by (2^A)^E. As the possible values of E are intigers and limited, you can construct a look-up table for this factor, or perform this (but it is slower):

#define pow2_A ...... // 2^A
#define pow2_minusA ...... // 2^-A

factor = 1;
if (E>0) {
for (ii=E; ii; ii--)
factor *= pow2_A;
}
else if (E<0) {
for (ii=E; ii; ii++)
factor *= pow2_minusA;
}
// Now, factor==(2^A)^E

Stay connected.
Regards

Z

pow

Hi zorro,

I've implemented the algorithm you described. The idea is so simple...after you know it. This algorithm executes 4 times faster than my standard pow function.
Still I do have problems if base is greater than 1. Please take a look at attached file.

Hi suiram,

your program seems to be OK. What do you mean saying that you do have problems if base is greater than 1?

Some suggestions that could optimize (just a little) your code on uC:

1) Many uC do have a hardware multiplier (but dont't have a hardware divider). In that case division is slower than multiplication (even in floating point). You can avoid the final division (1.0/Answer) using the constant POW_2_3x6 when exponent is negative.

2) (RemainingMultiplicity&1) is faster than (RemainingMultiplicity%2)

3) (RemainingMultiplicity >> 1) is faster than (RemainingMultiplicity /= 2) and is the same because !(RemainingMultiplicity%2)

4) arithmetic with float types could be faster than with double (this is implementation dependent). [But it's better to test the algorithm with doubles].

Rather than the average of errors, the squared-root of the average of squared relative errors (or average of absolute relative errors) is more meaningful for evaluating the approximation.
Using a high-order approximation (as your 8th order polynomial), when you compare the results with those of the standard pow() function, the errors of this function should be considered because they could be comparable to the errors of your xpow() function. The errors of the C library are implementation-dependent; they can be not too accurate because accurate functions cuold be very slow with a uC [pow() function performs log, multiplication and exp]. Results from the PC program are reliable as reference, but non necessarily if you test on the target uC.

Regards

Z

Hi,
Is there a range for x and y. If you know the ranges it can be easy to get a better approximation.

For example,

if x < 2.0, then the square root can be easliy evaluated, refer attachment but you may optimize on it.

say z = sqrt(x), x^(0.8) = x^(5/2/2) = z^5

Pl remember the number of iterations depends on accuracy required.

If you need simple implementations try mclaurin/taylors expansion.

Fast Exponentials in assembly or C

There is another approach to this problem if you have some spare memory in your prototype. Basically, you use a Look Up Table (LUT). Simply generate a log and an antilog table using any programming tool, and load the tables at program start-up. Then your equation: x^0.8+y^3.6 reduces to

result=antilog[log(x)*.8] +antilog[log*3.6]

Hope this helps

Status
Not open for further replies.