3

I am currently developing some software in C++ where I am sending and receiving custom data packets. I want to parse and manage these packets in a well structured manner. Obviously I am first receiving the header and after that the body of the data. The main problem is that I don't like creating a Packet-Object with only the header information and later on adding the body data. What is an elegant way of parsing and storing custom data packets?

Here is a rough sketch of what such a custom data packet could look like:

+-------+---------+---------+----------+------+
| Magic | Command | Options | Bodysize | Body |
+-------+---------+---------+----------+------+

(Lets assume Magic is 4 bytes, Command 1 byte, Options 2 bytes, Bodysize 4 bytes and the body itself is variable in length.) How would I parse this without using any third party libraries?

Normally I'd say something like this could be done to store packet data:

#include <array>

class Packet {
public:

    explicit Packet(std::array<char, 10> headerbytes);

    void set_body(std::vector<char> data);
    std::vector<char> get_body();

    int8_t get_command();

    int16_t get_options();

    bool is_valid();

private:

    bool valid;

    int8_t _command;

    int16_t _options;

    int32_t body_size;

    std::vector<char> _data;

};

The problem is that I provide the header-information first and than add the body data in a hacky way later on. The packet object has a point of time where it is accessible in an incomplete state.

I first receive the header and after the header was received another receive call is made to read the body. Would it make sense to have a parser instance that populates information into the packet object only make it accessible once it holds all needed information? Would it make sense to have a separate class for the header and the body? What would be the best design choice?

I am developing with C++ and for the sending and receiving of data over sockets the boost library is used.

6
  • 1
    "I don't like creating a Packet-Object with only the header information and later on adding the body data" - Why not? Commented Apr 21, 2019 at 16:42
  • 1
    @Jesper I think this would be bad design because the object can be accessed in an incomplete state. I am not sure what is common practice neither do I know what would be the best design choice for this. I am thinking of some packet processor that performs parsing and a packet data container that is filled and only accessible once populated with all data Commented Apr 21, 2019 at 23:19
  • Another way to do is a simple state machine. Many ways to implement one Commented Apr 25, 2019 at 0:13
  • Whenever you have a "full stop" (e.g. return something or store it, basically interface type), what you produced should be a type on its own. If it's a header, then call it MessageHeader. In c++ move ctor is extremely inexpensive: you might simply create e.g. Message(MessageHeader&&, MessageBody&&) ctor. If you need help on polymorphic message body, lmk, and I give you details on cps-style visitors / factories which might be the best fit here (they're fast and use stack; yet provide explicit types). Commented Apr 25, 2019 at 22:14
  • Thanks. Yeah some examples and advice what to use in this scenario is very welcome! Commented Apr 25, 2019 at 22:16

5 Answers 5

1

If you don’t want to tie the data reading into one complete constructor (for understandable reasons of separation of concerns), this is a good application for non-polymorphic inheritance:

struct Header {
  static constexpr SIZE=10;
  Header(std::array<char,SIZE>);

  std::int8_t get_command() const {return command;}
  std::int16_t get_options() const {return options;}
  std::int32_t body_size() const {return length;}

private:
  std::int8_t command;
  std::int16_t options;
  std::int32_t length;
};

struct Packet : private Header {
  using Body=std::vector<char>;
  Packet(const Header &h,Body b) : Header(h),body(std::move(b))
  {if(body.size()!=body_size()) throw …;}

  using Header::get_command;
  using Header::get_options;
  const Body& get_body() const {return body;}

private:
  Body body;
};

// For some suitable Stream class:
Header read1(Stream &s)
{return {s.read<Header::SIZE>()};}
Packet read2(const Header &h,Stream &s)
{return {h,s.read(h.body_size())};}
Packet read(Stream &s)
{return read2(read1(s),s);}

Note that the private inheritance prevents undefined behavior from deleting a Packet via a Header*, as well as the surely-unintended

const Packet p=read(s);
const Packet q=read2(p,s);   // same header?!

Composition would of course work as well, but might result in more adapter code in a full implementation.

If you were really optimizing, you could make a HeaderOnly without the body size and derive Header and Packet from that.

Sign up to request clarification or add additional context in comments.

2 Comments

Interesting approach. A few questions remain: Wouldn't it be better practice to make a separate struct for the Body as well? Also why did you choose structs over a class? Furthermore I was thinking if it might be conceivable to make the Packet class itself an interface? Maybe something like an IPacket which other classes can inherit? What do you think about this?
@Kyu96: What purpose would a struct containing just the Body serve? It doesn’t abstract over anything. As for struct vs. class, they are exactly the same except for default access control. You need an interface iff you need dynamic polymorphism, which your problem statement doesn’t suggest.
1

For this case I would use the pipeline design pattern creating 3 packet processor classes:

  • Command (handles magic bytes too)
  • Options
  • Body (handles body size too)

all derived from one base class.

typedef unsigned char byte;

namespace Packet
{
    namespace Processor
    {
        namespace Field
        {
            class Item
            {
            public:
                /// Returns true when the field was fully processed, false otherwise.
                virtual bool operator () (const byte*& begin, const byte* const end) = 0;
            };

            class Command: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };

            class Options: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };

            class Body: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };
        }

        class Manager
        {
        public:
            /// Called every time new data is received
            void operator () (const byte* begin, const byte* const end)
            {
                while((*fields[index])(begin, end))
                {
                    incrementIndex();
                }
            }

        protected:
            void incrementIndex();

            Field::Command command;
            Field::Options options;
            Field::Body body;
            Field::Item* const fields[3] = { &command, &options, &body };
            byte index;
        };
    }
}

8 Comments

That sounds like a very interesting approach, maybe you can provide a rough example for my scenario? Would you make header parsing, body parsing two pipeline steps?
How does the client learn how much data to send to the Manager?
@Davis Herring: The Manager is a state machine parsing the content of packets. It receives at a time an entire packet or a chunk of packet. It deals with fragmented packets.
@Kyu96: It's quite simple changing it to have only 2 steps, header and body.
@Flaviu: Then how does the client receive those packets if, say, one read is long enough to contain several?
|
1

You can use exceptions to prevent creation of incomplete packet objects.

I'd use char pointers instead of vectors for performance.

// not intended to be inherited
class Packet final {
public:
    Packet(const char* data, unsigned int data_len) {
        if(data_len < header_len) {
            throw std::invalid_argument("data too small");
        }

        const char* dataIter = data;

        if(!check_validity(dataIter)) {
            throw std::invalid_argument("invalid magic word");
        }
        dataIter += sizeof(magic);
        memcpy(&command, dataIter, sizeof(command)); // can use cast & assignment, too
        dataIter += sizeof(command);
        memcpy(&options, dataIter, sizeof(options)); // can use cast & assignment, too
        dataIter += sizeof(options);
        memcpy(&body_size, dataIter, sizeof(body_size)); // can use cast & assignment, too
        dataIter += sizeof(body_size);

        if( data_len < body_size+header_len) {
            throw std::invalid_argument("data body too small");
        }

        body = new char[body_size];
        memcpy(body, dataIter, body_size);
    }

    ~Packet() {
        delete[] body;
    }

    int8_t get_command() const {
        return command;
    }

    int16_t get_options() const {
        return options;
    }

    int32_t get_body_size() const {
        return body_size;
    }

    const char* get_body() const {
        return body;
    }

private:
    // assumes len enough, may add param in_len for robustness
    static bool check_validity(const char* in_magic) {
        return ( 0 == memcmp(magic, in_magic, sizeof(magic)) );
    }

    constexpr static char magic[] = {'a','b','c','d'};
    int8_t command;
    int16_t options;
    int32_t body_size;
    char* body;

    constexpr static unsigned int header_len = sizeof(magic) + sizeof(command)
            + sizeof(options) + sizeof(body_size);
};

Note: this is my first post in SO, so please let me know if something's wrong with the post, thanks.

4 Comments

Thanks for your contribution. From what I understand this would only work if the packet length is already known beforehand? The header contains the body length but your constructor requires me to provide a given amount of data in advance that is then parsed? Also it's always helpful to include a usage example of the posted code ;-)
While I don’t think std::vector is a significant performance problem here, if you don’t want to use it you should be using std::unique_ptr<char[]>. Also, “cast & assignment” is undefined behavior (cf. plans for implicit object creation).
@Kyu96 you are right about that, maybe you could use a packetbuilder with header and body setters, throwing exception when incomplete body/packet is requested?
@DavisHerring as you suggested using unique (shared-depending on the context) pointer is the correct way IMO and thanks for the extra info.
0

I'm guessing you are trying Object-oriented networking. If so, the best solution for such parsing would be Flatbuffers or Cap’n Proto C++ code generator. By defining a schema, you will get state machine code that will parse the packets in an efficient and safe way.

Comments

0

I'm late to the party, but I have a similar problem. I'm implementing the GDB stub protocol and I'm thinking about using the chain of responsibility (similar to the pipeline). The idea is to have a base message class msg and various messages

breakpoint_msg, step_msg, mread_msg, etc...

all derived from it. Each of them has a function can_handle(package). When I receive a package I iterate through all message classes calling can_handle(package) function on them. The one that recognizes the message is the one that get instantiated with the package data. I then call exec() on it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.