Efficient implementation of the N fibonacci numbers using complex algorithm

Question

I am looking for an even faster method of calculating the first N fibonacci numbers. I am already aware of the recursive method of solving this problem using memoization as well as the simpler method of iteratively going from 1 - N. However, is there a mathematical way I can solve this problem; or another algorithm that can even make this process slightly shorter?

My current implementation using memoization and recursion is quite fast, but it does not meet the requirements I desire.

#include<bits/stdc++.h>
using namespace std;
 
int fib(int n) {
    int a = 0, b = 1, c, i;
    if( n == 0)
        return a;
    for(i = 2; i <= n; i++)
    {
       c = a + b;
       a = b;
       b = c;
    }
    return b;
}


int main()
{
    int n = 9;
    cout << fib(n);
}

Don't do this: #include<bits/stdc++.h> -- This is a non-standard header, and it brings in an umpteen number of unnecessary header files. All your program needs is #include <iostream> — PaulMcKenzie
– PaulMcKenzie, Commented May 26, 2023 at 23:39
What are the requirements you desire? Are you aware of the Q-Matrix method? — Bob__
– Bob__, Commented May 27, 2023 at 0:00
There's a matrix formulation to represent the Fibonacci calculation, and Fib(n) is the n^th power of the matrix. This can be calculated with O(log(n)) work using fast exponentiation via squaring the matrix and halving the exponent. However, unless you use big-integer libraries to go for values bigger than Fib(46), this would be totally dominated by table lookup. Also, I don't have a C++ implementation. Here it is in Ruby if you're interested, it would be easy to port other than the arbitrary size integer arithmetic that Ruby or Python provide. — pjs
– pjs, Commented May 27, 2023 at 0:07

selbie · Accepted Answer · 2023-05-27 04:31:37Z

Don't deny the power of a table lookup.

Usually absolute math is faster. However, some of the mathematical solutions presented and discussed on this page rely on floating point math and the pow function. Both can be imprecise.

Fib(46) starts to approach the limit for a 32-bit signed int. Fib(92) is the limit for a signed 64-bit integer. Hence, simply hardcoding the answers is a reasonable thing to do.

I realize this isn't the answer you wanted from a computational perspective. But if you really needed to compute fib(n) in a real world setting with reasonable values that don't cause overflow or have floating point deviations on different platforms, it's hard to argue with this.

int fib32(int n) {
    const static int table[] = {
    0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025,121393,196418,317811,514229,832040,
    1346269,2178309,3524578,5702887,9227465,14930352,24157817,39088169,63245986,102334155,165580141,267914296,433494437,701408733,1134903170,
    1836311903 };

    if (n < 0 || n >= sizeof(table) / sizeof(table[0])) {
        return -1;
    }

    return table[n];
}

long long fib64(long long n) {
    const static long long table[] = {0LL, 1LL, 1LL, 2LL, 3LL, 5LL, 8LL, 13LL, 21LL, 34LL, 55LL, 89LL, 144LL, 233LL, 377LL, 610LL, 987LL, 1597LL, 2584LL, 4181LL, 6765LL, 10946LL, 17711LL, 28657LL, 46368LL, 75025LL, 121393LL, 196418LL, 317811LL, 514229LL, 832040LL, 1346269LL, 2178309LL,
        3524578LL, 5702887LL, 9227465LL, 14930352LL, 24157817LL, 39088169LL, 63245986LL, 102334155LL, 165580141LL, 267914296LL, 433494437LL, 701408733LL, 1134903170LL, 1836311903LL, 2971215073LL, 4807526976LL, 7778742049LL, 12586269025LL, 20365011074LL, 32951280099LL, 53316291173LL, 86267571272LL, 139583862445LL, 225851433717LL, 365435296162LL, 591286729879LL, 956722026041LL, 1548008755920LL, 2504730781961LL,
        4052739537881LL, 6557470319842LL, 10610209857723LL, 17167680177565LL, 27777890035288LL, 44945570212853LL, 72723460248141LL, 117669030460994LL, 190392490709135LL, 308061521170129LL, 498454011879264LL, 806515533049393LL, 1304969544928657LL, 2111485077978050LL, 3416454622906707LL, 5527939700884757LL, 8944394323791464LL, 14472334024676221LL, 23416728348467685LL, 37889062373143906LL, 61305790721611591LL, 99194853094755497LL, 160500643816367088LL,
        259695496911122585LL, 420196140727489673LL, 679891637638612258LL, 1100087778366101931LL, 1779979416004714189LL, 2880067194370816120LL, 4660046610375530309LL, 7540113804746346429LL};

    if (n < 0 || n >= sizeof(table)/sizeof(table[0])) {
        return -1;
    }

    return table[n];
}

PaulMcKenzie · Accepted Answer · 2023-05-27 01:09:17Z

2

Use Binet's formula.

Here is a rudimentary implementation in C++:

#include <cmath>
#include <iostream>

const double onesq5 = 0.44721359549995793928183473374626; // 1/sqrt(5)
const double phi1 = 1.6180339887498948482045868343656; // (1 + sqrt(5)) / 2
const double phi2 = -0.61803398874989484820458683436564; // (1 - sqrt(5)) / 2

uint64_t fib(int n)
{
    return onesq5 * (pow(phi1, n) - pow(phi2, n));
}

int main()
{
    for (int i = 1; i < 50; ++i)
       std::cout << fib(i) << "\n";
}

Live Example

There is no iteration or recursion to get the nth fibonacci number -- it is computed directly.
Using memoization on Binet's formula would be ok, since you wouldn't be calling pow multiple times with the same value for fib(n).

edited May 27, 2023 at 1:09

answered May 26, 2023 at 23:36

PaulMcKenzie

35.6k4 gold badges26 silver badges49 bronze badges

Comments

Pepijn Kramer · Accepted Answer · 2023-05-27 07:03:31Z

1

None of the examples yet use the full power of C++ to create the table at compile time like this (https://godbolt.org/z/nMeqj6xvq):

#include <array>
#include <cstdint>
#include <iostream>

//construct the table at compile time

template<std::size_t N>
static constexpr auto create_fibonacci_table()
{
    std::array<std::uint64_t, N> table{ 0,1 };

    for (std::size_t n = 2; n < table.size(); ++n)
    {
        table[n] = table[n - 1ul] + table[n - 2ul];
    }

    return table;
}

int main()
{
    static constexpr auto fibonacci_table = create_fibonacci_table<12>();
    std::cout << fibonacci_table[9ul];
    return 0;
}

answered May 27, 2023 at 7:03

Pepijn Kramer

14k4 gold badges12 silver badges27 bronze badges

Comments

Jerry Coffin · Accepted Answer · 2023-05-27 18:16:11Z

1

Just for fun, I did a quick speed comparison with the following code:

#include <cmath>
#include <iostream>
#include <chrono>
#include <locale>

unsigned long long fib(unsigned input) {
    unsigned long long old1 = 0;
    unsigned long long old2 = 1;

    for (int i = 1; i < input; i++) {
        unsigned long long sum = old1 + old2;
        old1 = old2;
        old2 = sum;
    }
    return old2;
}

unsigned long long fib64(long long n)
{
    const static unsigned long long table[] = { 0ULL, 1ULL, 1ULL, 2ULL, 3ULL, 5ULL, 8ULL, 13ULL, 21ULL, 34ULL, 55ULL, 89ULL, 144ULL, 233ULL, 377ULL, 610ULL, 987ULL, 1597ULL, 2584ULL, 4181ULL, 6765ULL, 10946ULL, 17711ULL, 28657ULL, 46368ULL, 75025ULL, 121393ULL, 196418ULL, 317811ULL, 514229ULL, 832040ULL, 1346269ULL, 2178309ULL,
        3524578ULL, 5702887ULL, 9227465ULL, 14930352ULL, 24157817ULL, 39088169ULL, 63245986ULL, 102334155ULL, 165580141ULL, 267914296ULL, 433494437ULL, 701408733ULL, 1134903170ULL, 1836311903ULL, 2971215073ULL, 4807526976ULL, 7778742049ULL, 12586269025ULL, 20365011074ULL, 32951280099ULL, 53316291173ULL, 86267571272ULL, 139583862445ULL, 225851433717ULL, 365435296162ULL, 591286729879ULL, 956722026041ULL, 1548008755920ULL, 2504730781961ULL,
        4052739537881ULL, 6557470319842ULL, 10610209857723ULL, 17167680177565ULL, 27777890035288ULL, 44945570212853ULL, 72723460248141ULL, 117669030460994ULL, 190392490709135ULL, 308061521170129ULL, 498454011879264ULL, 806515533049393ULL, 1304969544928657ULL, 2111485077978050ULL, 3416454622906707ULL, 5527939700884757ULL, 8944394323791464ULL, 14472334024676221ULL, 23416728348467685ULL, 37889062373143906ULL, 61305790721611591ULL, 99194853094755497ULL, 160500643816367088ULL,
        259695496911122585ULL, 420196140727489673ULL, 679891637638612258ULL, 1100087778366101931ULL, 1779979416004714189ULL, 2880067194370816120ULL, 4660046610375530309ULL, 7540113804746346429ULL, 12200160415121876738ULL };

    if (n < 0 || n >= sizeof(table) / sizeof(table[0])) {
        return -1;
    }

    return table[n];
}

const double onesq5 = 0.44721359549995793928183473374626; // 1/sqrt(5)
const double phi1 = 1.6180339887498948482045868343656; // (1 + sqrt(5)) / 2
const double phi2 = -0.61803398874989484820458683436564; // (1 - sqrt(5)) / 2

uint64_t fib_binet(int n)
{
    return onesq5 * (pow(phi1, n) - pow(phi2, n));
}

int main() {
    using namespace std::chrono;

    auto start = high_resolution_clock::now();
    auto res = fib(93);
    auto stop = high_resolution_clock::now();

    auto start2 = high_resolution_clock::now();
    auto res2 = fib64(92);
    auto stop2 = high_resolution_clock::now();

    auto start3 = high_resolution_clock::now();
    auto res3 = fib_binet(92);
    auto stop3 = high_resolution_clock::now();

    std::cout.imbue(std::locale(""));
    std::cout << res << "\t"<< res2 << "\t" << res3 << "\n";
    std::cout << "iteration time: " << duration_cast<nanoseconds>(stop - start) << "\n";
    std::cout << "Lookup time: " << duration_cast<nanoseconds>(stop2 - start2) << "\n";
    std::cout << "Binet time: " << duration_cast<nanoseconds>(stop3 - start3) << "\n";
}

The results I got are as follows:

12,200,160,415,121,876,738      12,200,160,415,121,876,738      12,200,160,415,121,913,856
iteration time: 0ns
Lookup time: 0ns
Binet time: 10,691ns

Observations:

although it's theoretically exact, when implemented on a computer using floating point arithmetic, Binet's formula can produce imprecise results.
Although it's clear it'll win when the numbers get large enough, for a result that'll fit in a 64-bit integer, Binet's formula is comparatively slow.
The implementation I'm using seems to have a minimum measurement time of 427 ns, which isn't sufficient to measure the other two meaningfully.

Speculating a bit, choosing between iteration and table lookup may not be trivial. I'd guess a great deal here will depend on call pattern. The table lookup is obviously O(1), but the constants involved potentially vary quite widely. If you're retrieving data from the cache, it'll be quite fast, but if you have to retrieve any of the data from main memory, that'll be considerably slower. If you call it repeatedly in a tight look to get all fib(1) through fib(93), the access pattern will be quite predictable, so on a typical CPU all but the first cache line will be prefetched into the cache, so the total time will be 1 main memory read + 92 cache reads.

A main memory read on a recent CPU takes roughly 42 clock cycles + 51 ns. Subsequent cache reads are likely coming from L1 cache, taking about 4 clocks apiece. So, for this pattern of being called in a tight loop, we get a total of about 92*4 clocks + 51ns. At (say) 4 GHz, that's roughly 92+51ns or 143ns total.

But if we interleave calls to this with a lot of other code, so essentially all of the reads come from main memory instead of cache, then computing all of them ends up as something like 93 * (42 clocks + 51ns). In this case, on our hypothetical 4 GHz processor, it ends up around 5,700 ns.

By contrast, the iterative algorithm is likely to need about one clock per iteration (one add, two moves that are probably handled as register renames in the ROB). To compute fib(1) through fib(93), is an average of 46.5 iterations, for a total of about 46.5 * 93 = 4325 clocks. At 4 GHz, that's about 1,100 ns.

So, to compute them all, the iterative solution is anywhere from about 10 times slower to about 5 times faster.

Side note: depending on the compiler you use, it may or may not be able to directly print out the duration produced for the final time of each algorithm. In that case change: duration_cast<nanoseconds>(stop - start) to something like duration_cast<nanoseconds>(stop - start).count() << "ns".

Reference

Here's a page of results from tests on memory access speed (with the caveat that there's obviously variation depending on both the processor and memory you use).

https://www.7-cpu.com/cpu/Skylake.html

edited May 27, 2023 at 18:16

answered May 27, 2023 at 2:52

Jerry Coffin

494k83 gold badges656 silver badges1.2k bronze badges

8 Comments

Bob__ Over a year ago

@KellyBundy Well, that's the code from PaulMcKenzie's answer, they just missed an #include <cmath>. The point that Binet's formula is not exact (from fib(72)) and expansive in the 64bit range still stands.

Bob__ Over a year ago

@KellyBundy The standard: en.cppreference.com/w/cpp/chrono/duration/operator_ltlt . It needs C++20.

Kelly Bundy Over a year ago

@Bob__ Thanks. I don't use godbolt, I think it's one of the sites that don't work well on my device. Anyway, isn't 10,691ns suspiciously enormously slow? Even Python only takes ~450 ns.

Jerry Coffin Over a year ago

@KellyBundy: Ooops--sorry, when I copied the code, I accidentally missed the first line. I've corrected it.

Kelly Bundy Over a year ago

Thanks. And I see you added discussion about the other two. What about the Binet one? Why is it so suspiciously enormously slow?

|

Collectives™ on Stack Overflow

Efficient implementation of the N fibonacci numbers using complex algorithm

4 Answers 4

Comments

Comments

Comments

Reference

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Reference

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related