Best Practices for loading primitive data types from raw bytes

Question

On occasion, it is very useful to reinterpret raw bytes structured data - ints, floats, etc. Eamples include reading from a mmapped file, reading some sort of in-memory data frame, or other tasks that involve interpreting raw bytes.

Let's make the following assumptions:

Both the input bytes and the computer are little-endian,
All of the reads are properly aligned
The code should be (somewhat) portable: it should work on the major compilers (gcc, clang, MSVC), and it should work on x86-64 and ARM
ALl supported platforms are sane: bytes are 8 bits in size, CHAR_BIT == 8, fixed-width integer types int8_t, int16_t, int32_t, etc exist.
The source is constant (eg, we do not care if reads are re-ordered or optimized away)
The code will be compiled with a modern optimizing compiler (eg, gcc or clang)

What are the best practices for interpreting raw bytes as primitive data types?

Scenario 1: loading a single value

Consider the following function:

// Load int32_t from src[0] to src[3]
inline int32_t load_int32( char const* src );

It's marked inline because we want it to be inlined so that the compiler can optimize it.

What's the preferred way to implement this function in C++17?

Option 1: use memcpy

Pros: not UB, major compilers optimize this well because they know what's happening
Cons: Debug build performance on MSVC sucks

inline int32_t load_int32( char const* src ) {
    int32_t dest;
    memcpy( &dest, src, sizeof( int32_t ) );
    return dest;
}

Option 2: reinterpret_cast

Pros: code is slightly cleaner, it's obvious what's going on
Cons: I believe this is technically UB since src doesn't originally point to a int32_t

inline int32_t load_int32( char const* src ) {
    return *reinterpret_cast<int32_t const*>( src );
}

Scenario 2: loading an array of values

Prior to C++23, (afaik) there is no standards-compliant way to get a pointer-to-int. So the options are:

Option 1: write an iterator which uses memcpy under the hood

inline raw_byte_iter<int32_t> load_int32_array( char const* src ) {
    return { src };
}

raw_byte_iter<T> holds a char const* and uses memcpy to load and store values:

template<class T>
class raw_byte_iter
{
    char const* src;
  public:
    using reference = T;
    using difference_type = ptrdiff_t;
    // ...
    
    raw_byte_iter( char const* src ) noexcept: src(src) {}
    void operator++() { src += sizeof(T); }

    raw_byte_iter operator+(ptrdiff_t diff) { return src + sizeof(T) * diff; }

    T operator*() const noexcept {
        T value;
        memcpy( &value, src, sizeof(T) );
        return value;
    }

    T operator[](ptrdiff_t i) const noexcept { return *( *this + i ); }

    // ...
};

Pros: This should optimize well on gcc and clang, msvc will hopefully be fine with it
Cons:
- If you want this to be a proper random access iterator then the reference type has to be a value of T. Which is disgusting. It also involves writing a whole iterator class.
- You can't pass raw_byte_iter to any low-level APIs which expect a pointer

Option 2: Use reinterpret_cast?

In this case, reinterpret_cast allows for a much cleaner implementation, but again, it's technically undefined behavior!

inline int32_t const* load_int32_array( char const* src ) {
    return reinterpret_cast<int32_t const*>( src );
}

Pros: very simple implementation, no need to worry about writing an iterator wrapper
Pros: can be directly passed to other low-level APIs
Cons: technically UB

Option 3 (C++23 only): std::start_lifetime_as_array

I believe C++23 allows us to bless this code with std::start_lifetime_as_array:

inline int32_t const* load_int32_array( char const* src, size_t n ) {
    return std::start_lifetime_as_array( src, n );
}

Pros: I believe this is the best solution
Cons: only available in C++23 and above, I can't use it right now

Summary

The root of the question is this:

Does it matter that reinterpret_cast here is UB, or is this a scenario where compilers do the right thing because too much code breaks otherwise? What makes the most sense for a production codebase?

I see two votes to close as being off-topic due to asking about a coding problem. I don't think the heart of this question devolves to "how can I get teh codez to werk." This feels more like a conceptual question about the conversion of char to int, which I feel is on-topic. — Greg Burghardt
– Greg Burghardt, Commented Dec 20, 2024 at 21:19
What do you mean by "debug build performance sucks"? First of all, that's weird. Secondly, why does it matter? Are you doing this conversion in a tight loop, in a performance critical section? Weird. Thirdly: why does debug build even matter? You should publish release build anyway. All in all: I don't get that objection at all. — freakish
– freakish, Commented Dec 21, 2024 at 12:18
@freakish It is quite possible that a memcpy of exactly four bytes is translated to a four byte read with optimisation, leaving the result in a register, and to a call to a generic “memcpy” without optimisation. — gnasher729
– gnasher729, Commented Dec 21, 2024 at 14:11
@gnasher729 I understand. But that doesn't answer my main question at all: who cares about performance of debug builds? — freakish
– freakish, Commented Dec 21, 2024 at 16:34
You said “this is weird”. I said “no it isn’t and here is why”. — gnasher729
– gnasher729, Commented Dec 21, 2024 at 20:10

Martin Gleich · Accepted Answer · 2024-12-21 13:43:00Z

I would recommend to reading individual bytes and converting them into a integer with shifts and or instructions. This has multiple advantages:

It is stable in regards to alignments and endianess.
It is fully standard-compatible.
With modern compiler it is equally fast as the other solutions.

It tested it with the following code

#include <cstring>

int read_big_endian(const unsigned char* bytes)
{
    return (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2]<<8) | bytes[3];
}
int read_little_endian(const unsigned char* bytes)
{
   return (bytes[3] << 24) | (bytes[2] << 16) | (bytes[1]<<8) | bytes[0];
}

On x86 it compiled into a single instruction for the little endian case and two instructions for big endian. I tested it with gcc and clang see https://godbolt.org/z/bnqezEjjT.

One important note, when working with bytes you should always use unsigned char instead of char. char makes a lot of issues in regards of conversions to integers and arithmetic, see https://stackoverflow.com/questions/75191/what-is-an-unsigned-char for some details. If you are using c++ std::byte might be even better.

Unfortunately msvc is not capable of optimizing this. But it can optimize memcpy. — freakish
– freakish, Commented Dec 21, 2024 at 17:11

Christophe · Accepted Answer · 2024-12-20 17:12:13Z

The option 2 is not to be advised, because int might have alignment requirements that are not guaranteed to be enforced by char. So this code could work on some pointers and segfault on others, depending on the compiler. And as always with UB anything could happen, including that it works.

(Hint in all transparency: I could not manage to produce the alignment issue with a common target architecture: https://ideone.com/uiF53Q )

The option 1 is slightly better, but it assumes that the endianness of the reader and the storer is the same (e.g. if you load your char from a file potentially produced on another system). The code then might not allow cross-platform interoperability.

Same applies for the array variant. I'm not yet familiar enough with the guarantees/constraints of std::start_lifetime_as_array(), but I would be surprise if it might not also subject to alignment issues.

gnasher729 · Accepted Answer · 2024-12-21 11:03:21Z

2

If you read binary data, I’d strongly suggest that you read it as an array of unsigned char and build values from the bytes you read. I would not make any assumptions about byte ordering.

So to read a 32 bit little-endian value: (byte[3] << 24) | (byte [2] << 16) | byte [1] << 8) | byte [0]. Works with any processor, and the format of your binary data should be documented. And you must always assume that binary data is not aligned. With a bit of luck the compiler recognises the pattern and optimises it according to your processor.

answered Dec 21, 2024 at 11:03

gnasher729

49.4k4 gold badges71 silver badges137 bronze badges

gcc and clang are capable of optimizing this pattern. Which was quite surprising for me tbh. However msvc can't do that. It's still a good advice though. I seriously doubt that this is a performance bottleneck.

freakish
– freakish

2024-12-21 13:33:25 +00:00
Commented Dec 21, 2024 at 13:33

Add a comment |

Stack Exchange Network

Best Practices for loading primitive data types from raw bytes

Scenario 1: loading a single value

Scenario 2: loading an array of values

Summary

3 Answers 3

Your Answer

Hot Network Questions

Best Practices for loading primitive data types from raw bytes

Scenario 1: loading a single value

Scenario 2: loading an array of values

Summary

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions