1

I have raw binary data blocks (actually, CBOR-encoded). To read numeric I use common form like:

template <typename T> // T can be uint64_t, double, uint32_t, etc...
auto read(const uint8_t *ptr) -> T {
    return *((T *)(ptr)); // all endianess-aware functions will be performed later
}

This solution works on x86/x86_64 PC and arm/arm64 iOS. But, on arm/armv7 Android with clang compiler on default release optimization level (-Os) i receive SIGBUS with code 1 (unaligned read) for types, larger then one byte. I fix that problem with another solution:

template <typename T>
auto read(const uint8_t *ptr) -> T {
    union {
        uint8_t buf[sizeof(T)];
        T value;
    } u;
    memcpy(u.buf, ptr, sizeof(T));
    return u.value;
}

Is there any platform-independent solution, that will not impact performance?

2
  • I think that's probably as good as you will get. Commented Jun 12, 2016 at 15:53
  • Use proper (de)serialisation instead of these undefined behaviour reinterpretations. You ran into some problems already, there can be more. Commented Jun 12, 2016 at 15:53

1 Answer 1

4

caveat - this answer assumes that the integer representation of the machine is little-endian, as does the question.

The only platform-independent and correct way is to use memcpy. You don't need a union.

Don't worry about efficiency. memcpy is a magic function, and the compiler will 'do the right thing'.

example when compiled for x86:

#include <cstring>
#include <cstdint>

template <typename T>
auto read(const uint8_t *ptr) -> T {
  T result;
  std::memcpy(&result, ptr, sizeof(T));
    return result;
}

extern const uint8_t* get_bytes();
extern void emit(std::uint64_t);

int main()
{
  auto x = read<std::uint64_t>(get_bytes());
  emit(x);

}

yields assembler:

main:
        subq    $8, %rsp
        call    get_bytes()
        movq    (%rax), %rdi         ; note - memcpy utterly elided
        call    emit(unsigned long)
        xorl    %eax, %eax
        addq    $8, %rsp
        ret

note: endian-ness

You can make this solution truly portable by adding a runtime endian-ness check. In reality, the check will be elided as the compiler will see through it:

constexpr bool is_little_endian()
{
    short int number = 0x1;
    char *numPtr = (char*)&number;
    return (numPtr[0] == 1);
}


template <typename T>
auto read(const uint8_t *ptr) -> T {
  T result = 0;
  if (is_little_endian())
  {
    std::memcpy(&result, ptr, sizeof(result));
  }
  else
  {
    for (T i = 0 ; i < sizeof(T) ; ++i)
    {
      result += *ptr++ << 8*i;
    }
  }
  return result;
}

The resulting machine code is unchanged:

main:
        subq    $8, %rsp
        call    get_bytes()
        movq    (%rax), %rdi
        call    emit(unsigned long)
        xorl    %eax, %eax
        addq    $8, %rsp
        ret
Sign up to request clarification or add additional context in comments.

10 Comments

Nonsense! The only compliant and platform-independent way is seerialisation with bitshifts! memcpy does not care about endianess or representation.
Endianess is not an issue, since all the compilers i need have intrinsic to convert byte order. When there is no intrinsic, i fallback to bitshifts.
@olaf I am aware of the endian-ness issue, and have assumed that the OP knows about it to. I was addressing the issue of transferring a byte stream to an integer. I'll add a caveat to the answer.
@Olaf updated, and an endian-proof solution added with the benefit that it's still 100% efficient on a little-endian x86.
@SBKarr: What is the problem using proper serialisation in the first place? A good compiler will not generate worse code than for the "hackish" approach with reinterpretation, but there is no need for testing, nor other target-dependent stuff.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.