Taylor Series in Java using float

Question

I am implementing a method berechneCosinus(float x, int ordnung) in Java that should approximate the cosine of x using the Taylor series up to the specified order.

My questions:

Can I optimize the computation inside the method, without changing the signatures, in order to exactly match the given test results?

Here is what I have so far:

public class Taylor {

    public float berechneFak(float x) {
        if (x <= 1) return 1.0f;
        int n = Math.round(x);
        float fak = 1.0f;
        for (int i = 2; i <= n; i++) {
            fak *= i;
        }
        return fak;
    }

    public float berechneCosinus(float x, int ordnung) {
        float pi = 3.14159265f;
        float zweiPi = 2.0f * pi;

        x = x % zweiPi;
        if (x > pi) x -= zweiPi;
        else if (x < -pi) x += zweiPi;

        if (ordnung == 0) return 0.0f;

        int anzahlTerme = (ordnung + 1) / 2;

        float summe = 0.0f;
        float term = 1.0f;  

        for (int i = 0; i < anzahlTerme; i++) {
            int n = i * 2; 
            if (i > 0) {
                term *= -x * x / ((n - 1) * n);  
            }
            summe += term;
        }
        return summe;
    }
}

This is the assignment:

public class Taylor {


  public float berechneFak(float x) {
    // insert code here!
    // drandenken: Vorlesung!
    return 0.0f;
  }


  public float berechneCosinus(float x, int ordnung) {
    float acc = 0.0f;
    // Ihr Code hierher!
    return acc;
  }

}

Test case:

Testfälle

Expected / received result:

Erwartet / Erhaltenes Ergebnis

I implemented the cosine calculation method using the Taylor series and normalized x to the interval [−π,π]. However, the results slightly differ from the expected values in some test cases. Since I’m not allowed to change the method signatures and can only work with floats, I’m looking for tips on how to improve the accuracy.

Here is a minimal reproducible example:

public class Taylor {
    public float berechneCosinus(float x, int ordnung) {
        if (ordnung == 0) return 0.0f;

        float sum = 0.0f;
        float term = 1.0f;
        for (int i = 0; i < (ordnung + 1) / 2; i++) {
            int n = 2 * i;
            if (i > 0)
                term *= -x * x / ((n - 1) * n);
            sum += term;
        }
        return sum;
    }

    public static void main(String[] args) {
        Taylor t = new Taylor();
        float x = 3.14159265f; // approx. PI
        System.out.printf("%.6f\n", t.berechneCosinus(x, 13));
        System.out.printf("%.6f\n", t.berechneCosinus(x, 14));
    }
}

The actual output is:

-0.999900  
-0.999900

The expected output should be:

-0.999899  
-0.999899

I'm aware of the limitations with floating point; the problem is that the tests check for exactly -0.999899. I'm wondering if it's possible to restructure the calculation to get the expected result. Nonetheless I'm going to ask my prof. Could be also possible that there is an error/mistake in the testing code.

Solution

Thanks for all the comments and suggestions; I appreciate all the efforts you guys made so far. I asked my prof today and he gave me a hint so I got the answer now.

I can't say why we should do it like the way he wants it but here is the final code:

public class Taylor {

    public float berechneFak(float x) {
        float produkt = 1.0f;
        for (int i = 2; i <= x; i++) {
            produkt *= i;
        }
        return produkt;
    }

    public float berechneCosinus(float x, int ordnung) {
        float acc = 0.0f;
        float a = 1.0f;
        for (int i = 0; i < ordnung; i += 2) {
            acc += (a / berechneFak(i)) * Math.pow(x, i);
            a *= (-1);
        }
        return acc;
    }
    public static void main(String[] args) {
        Taylor t = new Taylor();
        float x = 3.14159265f;
        for (int ordnung = 0; ordnung < 15; ordnung++) {
            System.out.printf("%.6f%n", t.berechneCosinus(x, ordnung));
        }
    }
}

But then again, all of that was pointed out in staging ground. Why was this posted, with all the major changes reviews? — Sören
– Sören, Commented Jun 21 at 20:46
@Andre This is kind of a meaningless question, on several levels. Yes, it's possible to improve the posted code, but shooting for an answer of exactly -0.999899 is crazy. The posted code is actually doing better at getting the "correct" answer, which of course is -1.0, even when you take the limited precision of float into account. Six places past the decimal isn't even enough to see exactly how close or far you are to the "expected" result. — Steve Summit
– Steve Summit, Commented Jun 21 at 21:19
Using Math.Pow(x, i) and computing the factorial again from scratch for every term suggests that your professor doesn't understand numerical analysis or efficiency. — Martin Brown
– Martin Brown, Commented Jun 24 at 22:24
if you have an answer please add it as an answer (text box and button at the end of this page), not in the question (you should be able to answer your own question -- see Posting my work as an answer or including it in the question if it is complete, but not necessarily the best answer?, mainly its accepted answer ((also there is no need to add SOLVED, CLOSED, or whatever to the title)) — user85421
– user85421, Commented Jun 26 at 13:50

Steve Summit · Accepted Answer · 2025-07-01 21:26:39Z

I pity the OP lumbered with such a silly and inflexible accuracy target for their implementation as shown above. They don't deserve the negative votes that this question got but their professor certainly does!

By way of introduction I should point out that we would not normally approximate sin(x) over such a wide range as 0-pi because to do so violates one of the important heuristics generally used in series expansions used to approximate functions in computing namely:

Each successive term in the series satisfies |a_nxⁿ| > |a_n+1xⁿ⁺¹|
Terms are added together smallest to largest to control rounding error.

IOW each successive term is a smaller correction to the sum than its predecessor and they are typically summed from highest order term first usually by Horner's method which lends itself to modern computers FMA instructions which can process a combined multiply and add with only one rounding each machine cycle.

To illustrate how range reduction helps I the test code below does the original range test with different numbers of terms N for argument limits of pi, pi/2 and pi/4. The first case evaluation for pi thrashes about wildly at first before eventually settling down. The latter pi/4requires just 4 terms to converge.

In fact we can get away with the wide range in this instance because sin, cos and exp are all highly convergent polynomial series with a factorial in the denominator - although the large alternating terms added to partial sums when x ~ pi do cause some loss of precision at the high end of the range.

We would normally approximate over a reduced range of pi/2, pi/4 or pi/6. However taking on this challenge head on there are several ways to do it. The different simple methods of summing the Taylor series can give a few possible answers depending on how you add them up and whether or not you accumulate the partial sum into a double precision variable. There is no compelling reason to prefer any one of them over another. The fastest method is as good as any.

There is really nothing good about the professor's recommended method. It is by far the most computationally expensive way to do it and for good measure it will violate the original specification of computing the Taylor series when N>=14 because the factorial result for 14! cannot be accurately represented in a float32 - the value is truncated.

The OP's original method was perfectly valid and neatly sidesteps the risk of overflow of x^N and N! by refining the next term to be added for each iteration inside the summation loop. The only refinement would be to step the loop in increments of 2 and so avoid computing n = 2*i.

@user85421's comment reminded me of a very old school way to obtain a nearly correct polynomial approximation for cos/sin by nailing the result obtained at a specific point to be exact. Called "shooting" and usually done for boundary value problems it is the simplest and easiest to understand of the more advanced methods to improve polynomial accuracy.

In this instance we adjust the very last term in x^N so that it hits the target of cos(pi) = -1 exactly. It can be manually tweaked from there to get a crude nearly equal ripple solution that is about 25x more accurate than the classical Taylor series.

The fundamental problem with the Taylor series is that it is ridiculously over precise near zero and increasingly struggles as it approaches pi. This hints that we might be able to find a compromise set of coefficients that is just good enough everywhere in the chosen range.

The real magic comes from constructing a Chebyshev equal ripple approximation using the same number of terms. This is harder to do for 32 bit floating point and since a lot of modern CPUs now have double precision arithmetic that is as fast as single precision you often find double precision implementations lurking inside nominally float32 wrappers.

It is possible to rewrite a Taylor series into a Chebyshev expansion by hand. My results were obtained using a Julia numerical code ARMremez.jl for rational approximations.

To get the best possible coefficient set for fixed precision working in practice requires a huge optimisation effort and isn't always successful but to get something that is good enough is relatively easy. The code below shows the various options I have discussed and sample coefficients. The framework used tests enough of the range of x values to put tight bounds on worst case absolute error |cos(x)-poly_cos(x)|.

In real applications of approximation we would usually go for minimum relative error | 1 - poly_cos(x)/cos(x)| (so that ideally all the bits in the mantissa are right). However the zero at pi/2 would make life a bit too interesting for a simple quick demo so I have used absolute error here instead.

The 6 term Chebyshev approximation is 80x more accurate but the error is in the sense that takes cos(x) outside the valid range |cos(x)| <= 1 (highly undesirable). That could easily be fixed by rescaling. They have been written in a hardcoded Horner fixed length polynomial implementation avoiding any loops (and 20-30% faster as a result).

The worst case error in the 7 term Chebyshev approximation computed in double precision is 1000x better at <9e-8 without any fine tuning. The theoretical limit with high precision arithmetic is 1.1e-8 which is below the 3e-8 Unit in the Last Place (ULP) threshold on 0.5-1.0. There is a good chance that it could be made correctly rounded for float32 with sufficient effort. If not then 8 terms will nail it.

One advantage of asking students to optimise their polynomial function on a range like 0-pi is that you can exhaustively test it for every possible valid input value x fairly quickly. Something that is usually impossible for double precision functions. A proper framework for doing this much more thoroughly than my hack below was included in a post by @njuffa about approximating erfc.

The test reveals that the OP's solution and the book solution are not that different, but the official recommended method is 30x slower or just 10x slower if you cache N!. This is all down to using pow(x,N) including the slight rounding differences in the sum and repeatedly recomputing factorial N (which leads to inaccuracies for N>14).

Curiously for a basic Taylor series expansion the worst case error is not always right at the end of the range - something particularly noticeable on the methods using pow()

Here is the results table:

Description	cos(pi)	error	min_error	x_min	max_error	x_max	time (s)
prof Taylor	-0.99989957	0.000100434	-1.436e-07	0.94130510	0.000100672	3.14159226	10.752
pow Taylor	-0.99989957	0.000100434	-1.436e-07	0.94130510	0.000100672	3.14159226	2.748
your Cosinus	-0.99989957	0.000100434	-1.570e-07	0.80652559	0.000100791	3.14157438	0.301
my Taylor	-0.99989951	0.000100493	-5.476e-08	1.00042307	0.000100493	3.14159274	0.237
shoot Taylor	-0.99999595	4.0531e-06	-4.155e-06	2.84360051	4.172e-06	3.14159012	0.26
Horner Chebyshev 6	-1.00000095	-9.537e-07	-1.330e-06	3.14106655	9.502e-07	2.21509051	0.177
double Horner Cheby 7	-1.00000000	0	-7.393e-08	2.34867692	8.574e-08	2.10044718	0.188

Here is the code that can be used to experiment with the various options. The code is C rather than Java but written in such a way that it should be easily ported to Java.

#include <stdio.h>
#include <math.h>  
#include <time.h>

#define SLOW   // to enable the official book answer 
//#define DIVIDE  // use explicit division vs multiply by precomputed reciprocal 

double TaylorCn[10], dFac[20], drFac[20];
float shootC6;
float Fac[20];
float C6[7] = { 0.99999922f, -0.499994268f, 0.0416598222f, -0.001385891596f, 2.42044015e-05f, -2.19788836e-07f }; // original 240 bit rounded down to float32 
// ref float C7[8] = { 0.99999999f, -0.499999892f, 0.0416664902f, -0.001388780783f, 2.47699662e-05f, -2.70797754e-07f, 1.724760709e-9f }; // original 240 bit rounded down to float32
float C7[8] = { 0.99999999f, -0.499999892f, 0.0416664902f, -0.001388780783f, 2.47699662e-05f, -2.707977e-07f, 1.72478e-9f };  // after simple fine tuning
double dC7[8] = { 0.9999999891722795, -0.4999998918375135482, 0.04166649019522770258731, -0.0013887807826936648, 2.47699662157542654e-05, -2.707977544202106e-07, 1.7247607089243954e-09 };
// Chebeshev equal ripple approximations obtained from modified ARMremez rational approximation code
// C7 +/- 1.08e-8 (computed using 240bit FP arithmetic - coefficients are not fully optimised for float arithmetic) actually obtain 9e-8 (could do better?)
// C6 +/- 7.78e-7 actually obtain 1.33e-6 (with fine tuning could do better)

const float pi = 3.1415926535f;

float TaylorCos(float x, int ordnung)    
{
    double sum, term,  mx2;
    sum = term = 1.0;
    mx2 = -x * x;
    for (int i = 2; i <= ordnung; i+=2) {
        term *= mx2 ;
#ifdef DIVIDE
        sum += term / Fac[i]; // slower when using divide
#else
        sum += term * drFac[i]; // faster to multiply by reciprocal
#endif
    }
    return (float) sum;
}

float fTaylorCos(float x)
{
    return TaylorCos(x, 12);
}

void InitTaylor()
{
    float x2, x4, x8, x12;
    TaylorCn[0] = 1.0;
    for (int i = 1; i < 10; i++) TaylorCn[i] = TaylorCn[i - 1] / (2 * i * (2 * i - 1)); // precomute the coefficients
    Fac[0] = 1;
    drFac[0] = dFac[0] = 1;
    for (int i = 1; i < 20; i++)
    {
      Fac[i] = i * Fac[i - 1];
      dFac[i] = i * dFac[i - 1];
      drFac[i] = 1.0 / dFac[i];
      if ((double)Fac[i] != dFac[i]) printf("float factorial fails for %i! %18.0f should be %18.0f error %10.0f ( %6.5f ppm)\n", i, Fac[i], dFac[i], dFac[i]-Fac[i], 1e6*(1.0-Fac[i]/dFac[i]));
    }
   x2 = pi * pi;
   x4 = x2 * x2;
   x8 = x4 * x4;
   x12 = x4 * x8;
   shootC6 = (float)(cos((double)pi) - TaylorCos(pi, 10)) / x12 * 1.00221f;   // fiddle factor for shootC6 with 7 terms  *1.00128;
}

float shootTaylorCos(float x)
{
    float x2, x4, x8, x12;
    x2 = x * x;
    x4 = x2 * x2;
    x8 = x4 * x4;
    x12 = x4 * x8;
    return TaylorCos(x, 10) + shootC6 * x12;
}

float berechneCosinus(float x, int ordnung) {
    float sum, term, mx2;
    sum = term = 1.0f;
    mx2 = -x * x;
    for (int i = 1; i <= (ordnung + 1) / 2; i++) {
        int n = 2 * i;
        term *= mx2 / ((n-1) * n);
        sum += term;
    }
    return sum;
}

float Cosinus(float x)
{
    return berechneCosinus(x, 12);
}

float factorial(int n)
{
    float result = 1.0f;    
    for (int i = 2; i <= n; i++)
        result *= i;
    return result;
}

float profTaylorCos_core(float x, int n)
{
    float sum, term, mx2;
    sum = term = 1.0f;
    for (int i = 2; i <= n; i += 2) {
        term *= -1;
        sum += term*pow(x,i)/factorial(i);
    }
    return (float)sum;
}

float profTaylorCos(float x)
{
    return profTaylorCos_core(x, 12);
}

float powTaylorCos_core(float x, int n)
{
    float sum, term;
    sum = term = 1.0f;
    for (int i = 2; i <= n; i += 2) {
        term *= -1;
        sum += term * pow(x, i) / Fac[i];
    }
    return (float)sum;
}

float powTaylorCos(float x)
{
    return powTaylorCos_core(x, 12);
}

float Cheby6Cos(float x)
{
    float sum, term, x2;
    sum = term = 1.0f;
    x2 = x * x;
    for (int i = 1; i < 6; i++) {
        term *= x2;
        sum += term * C6[i]; 
    }
    return sum;
}

float dHCheby7Cos(float x)
{
    double x2 = x*x;
    return (float)(dC7[0] + x2 * (dC7[1] + x2 * (dC7[2] + x2 * (dC7[3] + x2 * (dC7[4] + x2 * (dC7[5] + x2 * dC7[6])))))); // cos 7 terms
}

float HCheby6Cos(float x)
{
    float x2 = x * x;
    return C6[0] + x2 * (C6[1] + x2 * (C6[2] + x2 * (C6[3] + x2 * (C6[4] + x2 * C6[5])))); // cos 6 terms
}


void test(const char *name, float(*myfun)(float), double (*ref_fun)(double), double xstart, double xend)
{
    float cospi, cpi_err, x, ox, dx, xmax, xmin;
    double err, res, ref, maxerr, minerr;
    time_t start, end;
    x = xstart;
    ox = -1.0;
//    dx = 1.2e-7f;
    dx = 2.9802322387695312e-8f; // chosen to test key ranges of the function exhaustively
    maxerr = minerr = 0;
    xmin = xmax = 0.0;
    start = clock();
    while (x <= xend) {
        res = (*myfun)(x);
        ref = (*ref_fun)(x);
        err = res - ref;
        if (err > maxerr) {
            maxerr = err;
            xmax = x;
        }
        if (err < minerr) {
            minerr = err;
            xmin = x;
        }
        x += dx;
        if (x == ox) dx += dx;
        ox = x;
    }
    end = clock();
    cospi = (*myfun)(pi);
    cpi_err = cospi - cos(pi);
    printf("%-22s  %10.8f  %12g %12g  @  %10.8f  %12g  @  %10.8f  %g\n", name, cospi, cpi_err, minerr, xmin, maxerr, xmax, (float)(end - start) / CLOCKS_PER_SEC);
}

void OriginalTest(const char* name, float(*myfun)(float, int), float target, float x)
{
    printf("%s cos(%10.7f) using terms upto x^N\n N \t result  error\n",name, x);
    for (int i = 0; i < 19; i += 2) {
        float cx, err;
        cx = (*myfun)(x, i);
        err = cx - target;
        printf("%2i  %-12.9f  %12.5g\n", i, cx, err);
        if (err == 0.0) break;
    }
}

int main() {
    InitTaylor();    // note that factorial 14 cannot be represented accurately as a 32 bit float and is truncated.
                     // easy sanity check on factorial numbers is to count the number of trailing zeroes.

    float x = pi; // approx. PI
    OriginalTest("Taylor Cosinus", berechneCosinus, cos(x), x);
    OriginalTest("Taylor Cosinus", berechneCosinus, cos(x/2), x/2);
    OriginalTest("Taylor Cosinus", berechneCosinus, cos(x/4), x/4);

    printf("\nHow it would actually be done using equal ripple polynomial on 0-pi\n\n");
    printf("Chebyshev equal ripple   cos(pi) 6 terms %12.8f (sum order x^0 to x^N)\n", Cheby6Cos(x));
    printf("Horner optimum Chebyshev cos(pi) 6 terms %12.8f (sum order x^N to x^0)\n", HCheby6Cos(x));
    printf("Horner optimum Chebyshev cos(pi) 7 terms %12.8f (sum order x^N to x^0)\n\n", dHCheby7Cos(x));

    printf("Performance and functionality tests of versions - professor's solution is 10x slower ~2s on an i5-12600 (please wait)...\n");
    printf(" Description \t\t cos(pi)      error \t    min_error \t    x_min\tmax_error \t x_max \t  time\n");
#ifdef SLOW
    test("prof Taylor", profTaylorCos, cos, 0.0, pi);
    test("pow  Taylor", powTaylorCos,  cos, 0.0, pi);
#endif
    test("your Cosinus",  Cosinus, cos, 0.0, pi);
    test("my Taylor",  fTaylorCos, cos, 0.0, pi);
    test("shoot Taylor", shootTaylorCos, cos, 0.0, pi);
    test("Horner Chebyshev 6",  HCheby6Cos, cos, 0.0, pi);
    test("double Horner Cheby 7", dHCheby7Cos, cos, 0.0, pi);
    return 0;
}

It is interesting to make the sum and x2 variables double precision and observe the effect that has on the answers. If someone fancies running simulated annealing or another global optimiser to find the best possible optimised Chebyshev 6 & 7 float32 approximations please post the results.

I agree whole heartedly with Steve Summits final comments. You should think very carefully about risk of overflow of intermediate results and order of summation doing numerical calculations. Numerical analysis using floating point numbers follows different rules to pure mathematics and some rearrangements of an equation are very much better than others when you want to compute an accurate numerical value.

Steve Summit · Accepted Answer · 2025-06-26 13:47:22Z

I can't say why we should do it like the way he wants...

That's too bad. It doesn't sound like someone is doing a very good job at teaching.

I'm not a numerical analyst, so I'm not sure I can teach you what's going on, either, but as it happens I have studied this example. Here's what I can tell you about it.

In general, one of the common mistakes people make when coding floating-point expressions is to type them in straight from the mathematical formulation. But floating-point arithmetic does not always obey the same rules as pure mathematics, so it's not at all uncommon for the naïve approach to run into trouble.

In this case your professor's expression

acc += (a / berechneFak(i)) * Math.pow(x, i);

is more or less straight from the mathematical definition of the Taylor series. But it's got two, related problems:

It does a lot of unnecessary multiplication. At iteration i+1, it essentially recomputes berechneFak(i) and Math.pow(x, i) on the way to computing them for i+1.
Terms like berechneFak(i) and Math.pow(x, i) can get very big, very fast. That's not a problem in pure mathematics, but the range and precision of computer floating point numbers are limited. If a term overflows, it can demolish your results. When you have something like x = y/z, where y and z are both very big, you may lose precision in the quotient x even though x is nice and small and theoretically perfectly representable.

Here, there's a great way to address both problems. If you've already computed the factorial berechneFak(i), then on the next iteration you can simply multiply it by i+1 to get berechneFak(i+1). If you've already computed Math.pow(x, i), then you can simply multiply it by x again to get Math.pow(x, i+1). And if you perform both operations on a single running quotient variable term, as you did, you minimize the magnitude of the numbers involved, which reduces the possibilities for both overflow and precision loss.

So, based on these arguments, your implementation should perform quite a bit better than the one suggested by your instructor.

But for this particular Taylor series, the problem with the arguments I've presented is that, in my experience, they don't end up making much difference in practice. The efficiency argument is probably real. But it ends up being hard to show that the hypothetical inaccuracies due to overflow and precision loss will actually occur.

Assuming you've reduced the range of the input x properly, x won't be large and so Math.pow(x, i) won't grow so fast. And when you're computing y/z, even when y and z are large, the properties of division and of IEEE-754 floating point mean that you usually get a good result — you don't lose so much precision — after all. Finally, as I mentioned in a comment, the Taylor series for sin and cos are so darn good that even a naïve implementation tends to converge to a good answer, and quickly. (That's why, for the problem you chose, your implementation and the professor's gave practically identical results.)

In my experience, when it comes to sine and cosine, the only way to demonstrate that the "better" implementation really is better is to deliberately omit the range reduction step. For example, using the improved technique, in single precision, and computing the sine instead of the cosine, if you try to compute sin(14) — that is, 14 radians — after 20 iterations you'll get 0.9817389, which is somewhat close to the correct answer of 0.9879668. But the naïve approach gives 2.082662, which is not only completely wrong, it's obviously not a sine value at all.

(I'd like to present that worked-out example here, but as I mentioned it's for sine instead of cosine, and I've been investigating it using C, not Java.)

What's the bottom line? There are three or four take-home lessons, I think:

(1) In general, beware of typing in mathematical expressions straight from the definition.

(2) In general, well-chosen rearrangements, which are theoretically mathematically equivalent but which work around the various limitations of floating point, can be an excellent idea.

(3) It sounds like you might not always want to take this particular instructor's teachings to heart.

But also,

(4) As I mentioned, I am not a numerical analyst, so I might not really know what I'm talking about here, either.

Collectives™ on Stack Overflow

Taylor Series in Java using float

Solution

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Solution

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related