How should I approach parsing the network packet using C++ template?

Question

Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct. The approach I can think of is that I will declare a struct and disable the padding by using some compiler macro, probably something like:

struct Payload
{
   char field1;
   uint32 field2;
   uint32 field3;
   char field5;
} __attribute__((packed));

and then I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct. I think either one should work but it is kind of old school and probably not safe.

The reinterpret_cast approach mentioned, should be something like:

void receive(const char*data, std::size_t data_size)
{
    if(data_size == sizeof(payload)
    {
        const Payload* payload = reinterpret_cast<const Payload*>(data);
       // ... further processing ...
    }
}

I'm wondering are there any better approaches (more modern C++ style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.

Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.

If you use the class presented in the code snippet, definetly make sure to use static_assert with sizeof and offsetof to ensure the compiler actually creates a structure matching your header. Note that the code may not be portable though and most likely you do need to do an endianness conversion on the numbers: Usually network protocols encode numbers as big endian and many architectures use little endian for the encoding of numbers... — fabian
– fabian, Commented May 26, 2022 at 17:17
Other than what @fabian said, if performance are a relevant matter, be aware that your suggested approach could reduce them more than expected. — Federico
– Federico, Commented May 26, 2022 at 17:53
" I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure" No you may not. That's Undefined Behavior. However, you can memcpy the bytes into the address of an instance of the struct reinterpret_casted to char*. — user4442671
– user4442671, Commented May 26, 2022 at 18:10
Congratulations. You just killed performance. Never used packed to access data more than once. You need to memcpy the individual fields into a not-packed structure or copy-assign them from the packed to a not-packed version (which just automates the memcpy). — Goswin von Brederlow
– Goswin von Brederlow, Commented May 26, 2022 at 19:50
You may want some sort of mechanism to serialize the meta info into the network packet, and then a mechanism to deserialize the packet into a meta info structure. If your network packet includes variable length entries, it can get difficult without using some kind of mechanism like that. See this for an example of defining an interface over a fairly complicated on the wire data packet. — jxh
– jxh, Commented May 26, 2022 at 22:13

score 1 · Accepted Answer · 2022-05-26 19:57:13Z

1

There are many different ways of approaching this. Here's one:

Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.

Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.

In this approach, we:

Define a read_into(src&, dst&) function that takes in a source of raw bytes, as well as an object to populate.
Provide a general implementation for all arithmetic types is provided, switching from network byte order when appropriate.
Overload the function for our struct, calling read_into() on each field in the order expected on the wire.

#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>

// Use std::byteswap when available. In the meantime, just lift the implementation from 
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
    static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
    auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
    std::ranges::reverse(value_representation);
    return std::bit_cast<T>(value_representation);
}

template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
  {x.read(dst, size)};
};

// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
  src.read(reinterpret_cast<char*>(&dst), sizeof(dst));

  if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
    dst = byteswap(dst);
  }
}

struct Payload
{
   char field1;
   std::uint32_t field2;
   std::uint32_t field3;
   char field5;
};

// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
  read_into(src, dst.field1);
  read_into<std::endian::little>(src, dst.field2);
  read_into(src, dst.field3);
  read_into(src, dst.field5);
}

// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
//    char field1;
//    std::uint32_t field2;
//    std::uint32_t field3;
//    char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
//   src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }

// Example
struct some_data_source {
  std::size_t read(char*, std::size_t size);
};

void foo() {
    some_data_source data;

    Payload p;
    read_into(data, p);
}

An alternative API could have been dst.field2 = read<std::uint32_t>(src), which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.

see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn

edited May 26, 2022 at 19:57

answered May 26, 2022 at 18:46

user4442671

Sign up to request clarification or add additional context in comments.

4 Comments

Federico Over a year ago

with this solution (the old good way, reading every field one by one instead of the entire structure as OP wanted to do), does the Payload still need to be packed?

user4442671 Over a year ago

@Federico Well, nothing stops someone from just reading directly into the struct (I've amended the code in the answer to reflect that). But no, there is no hard requirement for the struct to be packed in that case.

Kelvinyu1117 Over a year ago

@Frank, probably the struct doesn't have to be packed. Reasons for me to make it packed is two fold: 1. the bytes are continuous in the stream, 2. I'm worrying about the cache locality thing

user4442671 Over a year ago

@Kelvinyu1117 Loading misaligned data into a register is much, much more expensive than the cache locality implication of padding structs like this.

Collectives™ on Stack Overflow

How should I approach parsing the network packet using C++ template?

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related