I'm writing a little collection of sorting algorithms and some benchmarks (using Google benchmark).
I wrote the following heap sort but I'm struggling in understanding why it is so slower than std::sort_heap implemented in the C++ Stardard Library. The code is inspired to the algorithm described in Cormen book "Introduction to algorithms"STL
Am I doing something silly?
I have also profiled the code using perf and it seems that the hottest instruction is the comparison function.
Here it goes my implementation:
template <typename Iterator>
inline Iterator left_child(const Iterator begin, Iterator idx) {
return idx + std::distance(begin, idx) + 1;
}
template <class Iterator>
inline Iterator right_child(const Iterator begin, Iterator idx) {
return left_child(begin, idx) + 1;
}
template <class Iterator>
inline Iterator parent(Iterator begin, Iterator idx) {
return begin + std::distance(begin + 1, idx) / 2;
}
template <typename Iterator, typename CMP_FN>
inline void heapify(const Iterator begin, const Iterator end, Iterator idx,
CMP_FN cmp) {
bool go = true;
while (go) {
const Iterator left(left_child(begin, idx));
const Iterator right(right_child(begin, idx));
Iterator candidate(idx);
// this if relies on if short circuiting cmp has to be guarded for segfault
if (left < end && !cmp(*left, *idx)) candidate = left;
//this if relies on if short circuiting cmp has to be guarded for segfault
if (right < end && !cmp(*right, *candidate)) candidate = right;
if (candidate != idx) {
std::swap(*candidate, *idx);
// go=true;
idx = candidate;
} else
go = false;
}
}
template <typename Iterator, typename CMP_FN>
inline void build_heap(const Iterator begin, const Iterator end, CMP_FN cmp) {
const auto d = distance(begin, end) / 2;
for (Iterator s = begin + d; s >= begin; s--)
heapify(begin, end, s, cmp);
}
/////////////////////////////////////////////////
/// @Brief Heap sort implementation (inspiration from Cormen et al Book)
// CMP_FN has type: D -> D -> bool
/////////////////////////////////////////////////
template <typename Iterator, typename CMP_FN>
void heap_sort(Iterator s, Iterator e, CMP_FN cmp) {
auto d = std::distance(s,e);
if(d <= 1)
return;
build_heap(s, e, cmp);
while(d--){
e--;
std::swap(*s, *e);
heapify(s, e, s, cmp);
}
}
When benchmarked those are the timings I get:
----------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------------------
benchmark_random_values<quicksorter_hoare>/32768 2 ms 2 ms 154
benchmark_random_values<quicksorter_hoare>/65536 4 ms 4 ms 73
benchmark_random_values<quicksorter_hoare>/131072 8 ms 8 ms 34
benchmark_random_values<quicksorter_hoare>/262144 17 ms 17 ms 16
benchmark_random_values<quicksorter_hoare>/524288 36 ms 36 ms 8
benchmark_random_values<quicksorter_hoare>/1048576 77 ms 77 ms 4
benchmark_random_values<quicksorter_hoare>/2097152 159 ms 159 ms 2
benchmark_random_values<quicksorter_hoare>/4194304 336 ms 336 ms 1
benchmark_random_values<quicksorter_hoare>/8388608 701 ms 701 ms 1
benchmark_random_values<quicksorter_hoare>/16777216 1466 ms 1466 ms 1
benchmark_random_values<heap_sorter>/32768 2 ms 2 ms 111
benchmark_random_values<heap_sorter>/65536 5 ms 5 ms 53
benchmark_random_values<heap_sorter>/131072 12 ms 12 ms 24
benchmark_random_values<heap_sorter>/262144 25 ms 25 ms 11
benchmark_random_values<heap_sorter>/524288 55 ms 55 ms 5
benchmark_random_values<heap_sorter>/1048576 120 ms 120 ms 2
benchmark_random_values<heap_sorter>/2097152 264 ms 264 ms 1
benchmark_random_values<heap_sorter>/4194304 659 ms 659 ms 1
benchmark_random_values<heap_sorter>/8388608 1657 ms 1657 ms 1
benchmark_random_values<heap_sorter>/16777216 4038 ms 4038 ms 1
benchmark_random_values<sort_heap_std>/32768 2 ms 2 ms 108
benchmark_random_values<sort_heap_std>/65536 5 ms 5 ms 60
benchmark_random_values<sort_heap_std>/131072 9 ms 9 ms 30
benchmark_random_values<sort_heap_std>/262144 20 ms 20 ms 12
benchmark_random_values<sort_heap_std>/524288 40 ms 40 ms 7
benchmark_random_values<sort_heap_std>/1048576 87 ms 87 ms 3
benchmark_random_values<sort_heap_std>/2097152 175 ms 175 ms 2
benchmark_random_values<sort_heap_std>/4194304 365 ms 365 ms 1
benchmark_random_values<sort_heap_std>/8388608 757 ms 757 ms 1
benchmark_random_values<sort_heap_std>/16777216 1736 ms 1736 ms 1
As you can see from the timing the heap sorter is much slower than the std::sort_heap and also than a quicksort that I wrote.
stdlibwhen refering to the standard library? \$\endgroup\$