C vs. C++: comparing function pointer tables and switch-case for multiple types support

Question

I have some API code that I need to rewrite. The original API is written in C++ and relies heavily on templates and modern C++ features like std::is_same_v. The primary purpose of this API is to read data from a file format that supports roughly 70 different types. For various reasons, I now need to rewrite this functionality in C. This is a strict requirement.

When I asked, "Why C?", the answer I received was that it compiles faster and produces faster executables. Fair enough. But I then proposed: "If I can demonstrate that the C++ code runs at roughly the same speed as C, can I implement it in C++ instead?" Eventually, the team agreed to let me try.

Problem Statement

To replicate support for the 70 different types in C (and to benchmark the feasibility of C vs. C++), I explored two approaches to read data of a given type from the file:

Using a switch-case construct:
Here, I assign an enum value to each supported type, allowing me to write something like:
```
switch (input_type) {
    case 1: /* read data of type vec1 */ break;
    ...
    case 70: /* read data of type vec70 */ break;
}
```
This approach is straightforward, but a switch statement with 70 cases feels unwieldy and potentially inefficient.
Using function pointers:
In this approach, I maintain a static table of function pointers, with each entry pointing to the appropriate function to handle a specific type. This is somewhat akin to a C++-style virtual table (vtable), where each supported type maps to its corresponding read function. I was curious to compare the performance of this approach with the switch-case approach.

C Implementation and Results

Below is my attempt at implementing this in C:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define NUM_TYPES 70
#define ARRAY_SIZE 70
#define NUM_ITERATIONS 100000

// Define 70 unique vector types
#define DEFINE_VEC_TYPE(N) \
    typedef struct {       \
        float data[N];     \
    } vec_##N;

#define TEST_VEC_TYPES(f) \
    f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8) f(9) f(10) \
    f(11) f(12) f(13) f(14) f(15) f(16) f(17) f(18) f(19) f(20) \
    f(21) f(22) f(23) f(24) f(25) f(26) f(27) f(28) f(29) f(30) \
    f(31) f(32) f(33) f(34) f(35) f(36) f(37) f(38) f(39) f(40) \
    f(41) f(42) f(43) f(44) f(45) f(46) f(47) f(48) f(49) f(50) \
    f(51) f(52) f(53) f(54) f(55) f(56) f(57) f(58) f(59) f(60) \
    f(61) f(62) f(63) f(64) f(65) f(66) f(67) f(68) f(69) f(70)

TEST_VEC_TYPES(DEFINE_VEC_TYPE)

#define DECLARE_FUNCTION(N) void process_type_##N(vec_##N* v);
TEST_VEC_TYPES(DECLARE_FUNCTION)

// Define 70 unique functions
#define DEFINE_FUNCTION(N)                                 \
    void process_type_##N(vec_##N* v) {                    \
        for (int i = 0; i < N; ++i) {                      \
            double result = 0.0;                           \
            for (int j = 0; j < N; ++j) {                  \
                result += sin(v->data[i] + j);             \
            }                                              \
            v->data[i] = (float)result;                    \
        }                                                  \
    }
TEST_VEC_TYPES(DEFINE_FUNCTION)

typedef void (*process_func)(void* v);
process_func func_table[NUM_TYPES + 1];

#define INIT_FUNC_TABLE(N) func_table[N] = (process_func)process_type_##N;
void initialize_func_table() {
    TEST_VEC_TYPES(INIT_FUNC_TABLE)
}

#define GEN_CASE(N) case N: process_type_##N((vec_##N*)v); break;
void switch_case(int type_index, void* v) {
    switch (type_index) {
        TEST_VEC_TYPES(GEN_CASE)
    }
}

void benchmark_function_pointer(void** data) {
    clock_t start = clock();
    for (int i = 0; i < NUM_ITERATIONS; ++i) {
        int type_index = (i % NUM_TYPES) + 1;
        func_table[type_index](data[type_index]);
    }
    clock_t end = clock();
    printf("Function pointer table time (C): %.2f seconds\n",
           (double)(end - start) / CLOCKS_PER_SEC);
}

void benchmark_switch_case(void** data) {
    clock_t start = clock();
    for (int i = 0; i < NUM_ITERATIONS; ++i) {
        int type_index = (i % NUM_TYPES) + 1;
        switch_case(type_index, data[type_index]);
    }
    clock_t end = clock();
    printf("Switch-case time (C): %.2f seconds\n",
           (double)(end - start) / CLOCKS_PER_SEC);
}

int main() {
    initialize_func_table();

    void* data[NUM_TYPES + 1];
    for (int i = 1; i <= NUM_TYPES; ++i) {
        data[i] = malloc(sizeof(vec_1) * i);
        for (int j = 0; j < i; ++j) {
            ((vec_1*)data[i])->data[j] = (float)(i + j);
        }
    }

    benchmark_switch_case(data);
    benchmark_function_pointer(data);

    for (int i = 1; i <= NUM_TYPES; ++i) {
        free(data[i]);
    }

    return 0;
}

Compile and run with:

clang -std=c23 test1.c -O3
./a.exe

Results:

Switch-case time (C): 1.61 seconds
Function pointer table time (C): 1.62 seconds

C++ Implementation and Results

In the C++ implementation, I used templates to generate the vec<N> types and their corresponding functions. Here is the code:

// C++ implementation
#include <iostream>
#include <vector>
#include <functional>
#include <cmath>
#include <ctime>
#include <array>
#include <memory>

//  70 different vec<N> types
template <size_t N>
struct vec {
    float data[N];
};

using FunctionPointer = std::function<void(void*)>;
std::vector<FunctionPointer> func_table;

template <size_t N>
inline void process_vec(void* vec_ptr) {
    vec<N>* v = static_cast<vec<N>*>(vec_ptr);
    for (size_t i = 0; i < N; ++i) {
        double result = 0.0;
        for (size_t j = 0; j < N; ++j) {
            result += std::sin(v->data[j] + j);
        }
        v->data[i] = static_cast<float>(result);
    }
}

template <size_t N>
void register_function() {
    func_table[N - 1] = [](void* vec_ptr) { process_vec<N>(vec_ptr); };
}

// registration for all 70 types
void do_registration() {
    func_table.resize(70);
    ([]<size_t... Is>(std::index_sequence<Is...>) {
        (register_function<Is + 1>(), ...);
    })(std::make_index_sequence<70>{});
}

template <size_t N>
vec<N> initialize_vec() {
    vec<N> v;
    for (size_t i = 0; i < N; ++i) {
        v.data[i] = static_cast<float>(N + i);
    }
    return v;
}

int main() {
    do_registration();

    std::vector<std::unique_ptr<void, void(*)(void*)>> data;

    ([&data]<size_t... Is>(std::index_sequence<Is...>) {
        ((data.emplace_back(
            new vec<Is + 1>(initialize_vec<Is + 1>()),
            [](void* ptr) { delete static_cast<vec<Is + 1>*>(ptr); })), ...);
    })(std::make_index_sequence<70>{});

    const size_t num_iterations = 100000;
    clock_t start = std::clock();
    for (size_t i = 0; i < num_iterations; ++i) {
        size_t type_index = i % 70;
        func_table[type_index](data[type_index].get());
    }
    clock_t end = std::clock();

    std::cout << "Function pointer table time (C++): "
              << static_cast<double>(end - start) / CLOCKS_PER_SEC
              << " seconds\n";

    return 0;
}

Results:

clang++ -std=c++23 test2.cpp -O3
./a.exe
Function pointer table time (C++): 1.655 seconds

Before I comment on the results, I want to stress—because I think this is the first feedback I am going to get—that it’s super hard to compare apples to bananas. I know that. However, the results being this close makes me think that the benchmark is probably reasonably fair. I am not looking at the generated assembly because I am not good enough to read assembly code, but doing so might provide some interesting insights.

Summary and Questions

While the results for both implementations are very close, the C++ code took significantly longer to compile and produced larger executables (as expected). This makes me wonder if the runtime comparison is fair, as compiler optimizations and generated assembly might differ.

I would love feedback on:

Potential improvements to the C implementations? Is there an alternative to the use of macros here, beside code duplication, alternative to the 2 suggested approaches, etc.
Whether the approach to benchmarking seems reasonable.

Please provide a scenario in which every "file format driver" will be invoked (perhaps sequentially), and the time to select each one is significant (compared to the driver 'running'). Please explain how "a strict requirement" can be made flexible and time funded to pursue alternative strategies... What are the flexibilities offered by a "switch" that are not available when using a function pointer solution? — user272752
– user272752, Commented Jan 4 at 22:56
"Please explain how "a strict requirement" can be made flexible and time funded to pursue alternative strategies... ". I guess people I work with are open-minded). I understand the example is very limited but having a real example in which the types would be read from the file and passed to the function filtering them, wouldn't make much of a difference with just generating them on the fly like I do in the example? Or would it? — M. Saintourens
– M. Saintourens, Commented Jan 5 at 11:20
I can only guess. What I see here is versions of a "dispatcher" that sequentially runs a collection of "time wasting" functions that are all the same. The dispatcher is likely 0.0001% of the measured execution time, proving nothing about switch() vs function pointer... Since this is not "real code", there's very little that should/could be said about it. The question of 'C' vs 'C++' should be answered by which has the greater pool of expertise in the group. Can the junior programmer deal conceptually with templates? "Strict requirements" are not 'flexible' in my experience... — user272752
– user272752, Commented Jan 5 at 11:38
@Fe2O3 yes I see your point. From my experience of that project, the devs have walked away from C++ after years of using due to what has become its current "complexity" making it harder for people to read the code. I am just vaguely quoting what I have understood of my interactions with them. I get your point about the code being too simple to show value, but one has to start somewhere and while I understand the topic C vs C++ is otherwise subjective, I am still hoping to get constructive feedback here about the problem. Notably how to handle this in C efficiently if possible at all. — M. Saintourens
– M. Saintourens, Commented Jan 5 at 12:03
@Fe2O3 ""Strict requirements" are not 'flexible' in my experience... ". Well is that why we have mutable in C++). Is it a way of saying you can't be changed but you can still be changed if I decide to. Example of strict requirement that's somehow flexible). — M. Saintourens
– M. Saintourens, Commented Jan 5 at 12:06

G. Sliepen · Accepted Answer · 2025-01-05 16:15:28Z

About C vs. C++

When I asked, "Why C?", the answer I received was that it compiles faster

A big slowdown for C++ is often when lots of templates are used, as the compiler than has to instantiate and compile the generated instantiations.

That said, if you have one C++ template that is instantiated 70 times, it shouldn't be much slower than compiling 70 C functions that contain mostly duplicated code.

Also, macro expansion is also not free, so if you use a lot of them as a replacement for templates, you are not better of.

and produces faster executables.

There is no reason why that should be the case, unless you wrote very bad code to begin with. Often C++ code can generate faster executables than C thanks to its stronger type system.

Do you need 70 different types at all?

Instead of having 70 different vec<N> types (which are basically just std::array<float, N>), consider that you could just make a single type that holds up to 70 values:

struct vec {
    static constexpr std::size_t max_size = 70;
    float data[max_size];
    std::size_t size;
};

Then processing a "vec" becomes:

void process_vec(vec& v) {
    for (size_t i = 0; i < v.size; ++i) {
        double result = 0.0;
        for (size_t j = 0; j < v.size; ++j) {
            result += std::sin(v.data[j] + j);
        }
        v.data[i] = static_cast<float>(result);
    }
}

And this then also greatly simplifies main():

int main() {
    std::vector<vec> data(70);

    for (std::size_t i = 0; auto& v: data) {
        v.size = ++i;
        initialize_vec(v);
    }

    const size_t num_iterations = 100000;
    clock_t start = std::clock();
    for (size_t i = 0; i < num_iterations; ++i) {
        size_t type_index = i % 70;
        process_vec(data[type_index]);
    }
    clock_t end = std::clock();

    std::cout << "Function pointer table time (C++): "
              << static_cast<double>(end - start) / CLOCKS_PER_SEC
              << " seconds\n";
}

Once C++26 support comes to your compiler, you can use std::inplace_vector<float, 70> instead of creating your own vec.

Even if your types are actually more complex than this, you can probably use some form of type erasure to solve this issue.

This should not be slower than your code. In fact, since there is a single type vec, you can just allocate a std::vector or even a std::array of it, without needing to allocate the individual vecs, and there is therefore also less pointer indirection.

The drawback in the above code is the increase in memory usage. There are ways around that as well, for example you could just allocate a single array of 2485 floats and allocate chunks of 1, 2, …, 70 floats from that.

Your benchmark results may be invalid

What part of the code are you actually trying to measure? The overhead of function call vs. switch statement? Maybe both are very small compared to the time it takes to actually execute the process…() functions. If so, the benchmark results are not very useful.

Note that sin() in C always works on doubles, meaning it will cast from float to double and back, whereas std::sin() will actually do the equivalent of C's sinf(). This may or may not result in a significant performance difference.

Also, sin() does not necessarily take a constant time; it could take longer or shorter depending on whether the value you pass it is very small (denormal) or very large (requiring range reduction). That's still fine if each benchmark gets fed the same values, but that's not the case for the function pointer benchmark in the C code!

Furthermore, the compiler can see that you are not actually using the result of the calculation. It is therefore allowed for it to just elide parts or even the whole for-loop. To avoid this, make sure it prints the results at the end (even if it's just the sum of all the values).

The best benchmark is when running production code with real inputs.

This is cool thanks but in the real case scenario the 70 types are different. So it can't just be replaced by a single vec. I have just used that example as a convenient way of simulating the 70 different types that I need to deal with with the real format. Regarding compile time I understand what you say but in practice the code I provide shows a large time difference (roughly I'd say at least twice as fast for the C version). C++ do carry with it more than just the template expansion most probably. — M. Saintourens
– M. Saintourens, Commented Jan 5 at 11:17
If you don't show us the actual code that you are working on, we cannot give you a good review. — G. Sliepen
– G. Sliepen, Commented Jan 5 at 13:35
If you don’t show the actual code, we cannot give a review period. All we can do is play guessing games. — indi
– indi, Commented Jan 5 at 23:21

Rud48 · Accepted Answer · 2025-01-16 23:43:28Z

My first reaction is the team is not doing C++ properly or is misleading themselves about C being less complex. But that's not your issue.

I've done testing similar to yours since I started with C++ in 1990.

Virtual function or pointers to function calls are slower than direct calls.
Virtual function calls are a decision just as a switch is a decision. Virtual functions are faster than a switch.
The timing difference is nanoseconds on today's PCs.

Without knowing much more about the code base, it is impossible to suggest why C++ might be slower. It actually may be faster and smaller.

Consider that in C, every function must validate the arguments passed to it. Those arguments are the entire state the function requires to do its calculation.

In C++, the state of an object is guaranteed to be valid. Arguments need to be validated, but there are fewer arguments. That means less validation, which means faster and smaller code.

A recent thesis Modern C++ in Embedded Systems examines this issue and many others on the C vs C++ debate for embedded systems.

Stack Exchange Network

C vs. C++: comparing function pointer tables and switch-case for multiple types support

Problem Statement

C Implementation and Results

C++ Implementation and Results

Summary and Questions

2 Answers 2

About C vs. C++

Do you need 70 different types at all?

Your benchmark results may be invalid

You must log in to answer this question.

Hot Network Questions

C vs. C++: comparing function pointer tables and switch-case for multiple types support

Problem Statement

C Implementation and Results

C++ Implementation and Results

Summary and Questions

2 Answers 2

About C vs. C++

Do you need 70 different types at all?

Your benchmark results may be invalid

You must log in to answer this question.

Related

Hot Network Questions