I have some API code that I need to rewrite. The original API is written in C++ and relies heavily on templates and modern C++ features like std::is_same_v. The primary purpose of this API is to read data from a file format that supports roughly 70 different types. For various reasons, I now need to rewrite this functionality in C. This is a strict requirement.
When I asked, "Why C?", the answer I received was that it compiles faster and produces faster executables. Fair enough. But I then proposed: "If I can demonstrate that the C++ code runs at roughly the same speed as C, can I implement it in C++ instead?" Eventually, the team agreed to let me try.
Problem Statement
To replicate support for the 70 different types in C (and to benchmark the feasibility of C vs. C++), I explored two approaches to read data of a given type from the file:
Using a
switch-caseconstruct:
Here, I assign anenumvalue to each supported type, allowing me to write something like:switch (input_type) { case 1: /* read data of type vec1 */ break; ... case 70: /* read data of type vec70 */ break; }This approach is straightforward, but a
switchstatement with 70 cases feels unwieldy and potentially inefficient.Using function pointers:
In this approach, I maintain a static table of function pointers, with each entry pointing to the appropriate function to handle a specific type. This is somewhat akin to a C++-style virtual table (vtable), where each supported type maps to its corresponding read function. I was curious to compare the performance of this approach with theswitch-caseapproach.
C Implementation and Results
Below is my attempt at implementing this in C:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define NUM_TYPES 70
#define ARRAY_SIZE 70
#define NUM_ITERATIONS 100000
// Define 70 unique vector types
#define DEFINE_VEC_TYPE(N) \
typedef struct { \
float data[N]; \
} vec_##N;
#define TEST_VEC_TYPES(f) \
f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8) f(9) f(10) \
f(11) f(12) f(13) f(14) f(15) f(16) f(17) f(18) f(19) f(20) \
f(21) f(22) f(23) f(24) f(25) f(26) f(27) f(28) f(29) f(30) \
f(31) f(32) f(33) f(34) f(35) f(36) f(37) f(38) f(39) f(40) \
f(41) f(42) f(43) f(44) f(45) f(46) f(47) f(48) f(49) f(50) \
f(51) f(52) f(53) f(54) f(55) f(56) f(57) f(58) f(59) f(60) \
f(61) f(62) f(63) f(64) f(65) f(66) f(67) f(68) f(69) f(70)
TEST_VEC_TYPES(DEFINE_VEC_TYPE)
#define DECLARE_FUNCTION(N) void process_type_##N(vec_##N* v);
TEST_VEC_TYPES(DECLARE_FUNCTION)
// Define 70 unique functions
#define DEFINE_FUNCTION(N) \
void process_type_##N(vec_##N* v) { \
for (int i = 0; i < N; ++i) { \
double result = 0.0; \
for (int j = 0; j < N; ++j) { \
result += sin(v->data[i] + j); \
} \
v->data[i] = (float)result; \
} \
}
TEST_VEC_TYPES(DEFINE_FUNCTION)
typedef void (*process_func)(void* v);
process_func func_table[NUM_TYPES + 1];
#define INIT_FUNC_TABLE(N) func_table[N] = (process_func)process_type_##N;
void initialize_func_table() {
TEST_VEC_TYPES(INIT_FUNC_TABLE)
}
#define GEN_CASE(N) case N: process_type_##N((vec_##N*)v); break;
void switch_case(int type_index, void* v) {
switch (type_index) {
TEST_VEC_TYPES(GEN_CASE)
}
}
void benchmark_function_pointer(void** data) {
clock_t start = clock();
for (int i = 0; i < NUM_ITERATIONS; ++i) {
int type_index = (i % NUM_TYPES) + 1;
func_table[type_index](data[type_index]);
}
clock_t end = clock();
printf("Function pointer table time (C): %.2f seconds\n",
(double)(end - start) / CLOCKS_PER_SEC);
}
void benchmark_switch_case(void** data) {
clock_t start = clock();
for (int i = 0; i < NUM_ITERATIONS; ++i) {
int type_index = (i % NUM_TYPES) + 1;
switch_case(type_index, data[type_index]);
}
clock_t end = clock();
printf("Switch-case time (C): %.2f seconds\n",
(double)(end - start) / CLOCKS_PER_SEC);
}
int main() {
initialize_func_table();
void* data[NUM_TYPES + 1];
for (int i = 1; i <= NUM_TYPES; ++i) {
data[i] = malloc(sizeof(vec_1) * i);
for (int j = 0; j < i; ++j) {
((vec_1*)data[i])->data[j] = (float)(i + j);
}
}
benchmark_switch_case(data);
benchmark_function_pointer(data);
for (int i = 1; i <= NUM_TYPES; ++i) {
free(data[i]);
}
return 0;
}
Compile and run with:
clang -std=c23 test1.c -O3
./a.exe
Results:
Switch-case time (C): 1.61 seconds
Function pointer table time (C): 1.62 seconds
C++ Implementation and Results
In the C++ implementation, I used templates to generate the vec<N> types and their corresponding functions. Here is the code:
// C++ implementation
#include <iostream>
#include <vector>
#include <functional>
#include <cmath>
#include <ctime>
#include <array>
#include <memory>
// 70 different vec<N> types
template <size_t N>
struct vec {
float data[N];
};
using FunctionPointer = std::function<void(void*)>;
std::vector<FunctionPointer> func_table;
template <size_t N>
inline void process_vec(void* vec_ptr) {
vec<N>* v = static_cast<vec<N>*>(vec_ptr);
for (size_t i = 0; i < N; ++i) {
double result = 0.0;
for (size_t j = 0; j < N; ++j) {
result += std::sin(v->data[j] + j);
}
v->data[i] = static_cast<float>(result);
}
}
template <size_t N>
void register_function() {
func_table[N - 1] = [](void* vec_ptr) { process_vec<N>(vec_ptr); };
}
// registration for all 70 types
void do_registration() {
func_table.resize(70);
([]<size_t... Is>(std::index_sequence<Is...>) {
(register_function<Is + 1>(), ...);
})(std::make_index_sequence<70>{});
}
template <size_t N>
vec<N> initialize_vec() {
vec<N> v;
for (size_t i = 0; i < N; ++i) {
v.data[i] = static_cast<float>(N + i);
}
return v;
}
int main() {
do_registration();
std::vector<std::unique_ptr<void, void(*)(void*)>> data;
([&data]<size_t... Is>(std::index_sequence<Is...>) {
((data.emplace_back(
new vec<Is + 1>(initialize_vec<Is + 1>()),
[](void* ptr) { delete static_cast<vec<Is + 1>*>(ptr); })), ...);
})(std::make_index_sequence<70>{});
const size_t num_iterations = 100000;
clock_t start = std::clock();
for (size_t i = 0; i < num_iterations; ++i) {
size_t type_index = i % 70;
func_table[type_index](data[type_index].get());
}
clock_t end = std::clock();
std::cout << "Function pointer table time (C++): "
<< static_cast<double>(end - start) / CLOCKS_PER_SEC
<< " seconds\n";
return 0;
}
Results:
clang++ -std=c++23 test2.cpp -O3
./a.exe
Function pointer table time (C++): 1.655 seconds
Before I comment on the results, I want to stress—because I think this is the first feedback I am going to get—that it’s super hard to compare apples to bananas. I know that. However, the results being this close makes me think that the benchmark is probably reasonably fair. I am not looking at the generated assembly because I am not good enough to read assembly code, but doing so might provide some interesting insights.
Summary and Questions
While the results for both implementations are very close, the C++ code took significantly longer to compile and produced larger executables (as expected). This makes me wonder if the runtime comparison is fair, as compiler optimizations and generated assembly might differ.
I would love feedback on:
- Potential improvements to the C implementations? Is there an alternative to the use of macros here, beside code duplication, alternative to the 2 suggested approaches, etc.
- Whether the approach to benchmarking seems reasonable.
switch()vsfunction pointer... Since this is not "real code", there's very little that should/could be said about it. The question of 'C' vs 'C++' should be answered by which has the greater pool of expertise in the group. Can the junior programmer deal conceptually with templates? "Strict requirements" are not 'flexible' in my experience... \$\endgroup\$mutablein C++). Is it a way of saying you can't be changed but you can still be changed if I decide to. Example of strict requirement that's somehow flexible). \$\endgroup\$