7

I am trying to implement a TLV (Type-Length-Value) in C, however I am having issues with getting the dynamic size of value to work correctly.

My struct looks like this:

typedef struct __attribute__((packed)){
   unsigned char type;
   unsigned char length;
   unsigned char * value;
} TLV;

I am trying to cast an array to the struct so that I can easily access the type, and length. For example an array:

unsigned char test[5] = {(unsigned char)'T', 0x03, 0x01, 0x02, 0x03};

Where the 'T' in the array is the type, the first 0x03 is the length.

I am casting the array to the struct like so:

TLV* tlv = (TLV*)test; 

Yet when I try to access the value array I get a segmentation fault, even when I try to access the first element of the value memory address (which should be the first element in the array after the length).

How can I get around this segmentation fault?

2
  • 1
    thats because your value is a pointer and you asign it to point to address 0x01. you cannot do it that way. Commented Jan 12, 2015 at 11:58
  • Pointers are not arrays. Commented Jan 12, 2015 at 12:39

2 Answers 2

5

value is not array, it's a pointer (which is pointing somewhere outside the struct). If you want array (of unknown size), write unsigned char value[1] instead.

typedef struct __attribute__((packed)) {
    unsigned char type;
    unsigned char length;
    unsigned char value[1];
} TLV;

Having an array of size 1 allows you to actually address any number of bytes. That is actually UB, but it is actually used and works properly in all cases I saw.

GCC allows to use arrays of size 0. I am so used to that convention that I forgot that arrays of size 0 are not allowed in C.

Edit:

Long answer

There is a difference between arrays and pointers. While you can use the similar code to work with both, these are still different beasts.

Disclaimer: The following code works in gcc, but it might not be strictly valid. I did not try to make it completely valid.

Let's define two structures:

typedef struct {
    char p[20];
} sa;

typedef struct {
    char *p;
} sp;

And create instances of those:

sa x = { "Hello, world" };
sp y = { "Howdy, world" };

What's the difference between those two?

printf("%s\n", x.p); // prints "Hello, world"
printf("%s\n", y.p); // prints "Howdy, world"

What about addresses of these?

printf("address of x = %p\n", &x); // On my machine it prints 0x7fffacce9b20
printf("address of y = %p\n", &y); // 0x7fffacce9b10

Well.. not really interesting except that these numbers are .. quite similar - both structures are located in about the same spot - in my case it's stack, end of address space, but could be somewhere else.

printf("address of x.p = %p\n", &x.p); // 0x7fffacce9b20
printf("address of y.p = %p\n", &y.p); // 0x7fffacce9b10

Same numbers. As expected actually.

printf("address of x.p[0] = %p\n", &x.p[0]); // 0x7fffacce9b20 - same as before
printf("address of y.p[0] = %p\n", &y.p[0]); // 0x400764 - could be anything

Now these are different. The string "Hello, world" is located in the same spot as structure x, while string "Howdy, world" is located somewhere else - data segment, somewhere in the beginning of address space, but, again could be somewhere else.

So this is the difference: array is some data stored "here", while pointer is only address of the data stored "somewhere".

In your case you want to be able to keep the data somewhere "here" - right after the type and length. This is why you need arrays, not pointers.

I can not find any proof that the TLV implementation above is not UB, but I saw a lot of cases where array of chars was "parsed" by casting it to pointer to some structure. I even wrote a code like this myself.

0-size arrays

As I said before, arrays of size 0 are not allowed by C standard. But they are allowed by GCC and this is handy because it allows you to do the following:

typedef struct {
    unsigned char type;
    unsigned char length;
    unsigned char value[0];
} TLV;

int required_length = 10;
TLV *tlv = (TLV *) malloc(sizeof(TLV) + required_length);

Without 0-size arrays you'd have to add (or subtract? subtract I guess) 1 somewhere in the code above.

Sign up to request clarification or add additional context in comments.

Comments

-1

The following is almost entirely portable and certainly not UB due to aliasing because at no point is test dereferenced so you can forget about it.

What isn't (technically) portable is the assumption that there's no internal padding in the struct TLV.

To obtain portability I've removed the __attribute__((packed)).

If your compiler supports it then you are 100% in the clear with no UB.

That is unless you change value to an aligned type then you are likely to be broken. This all works because sizeof(unsigned char) has to be 1 and type alignment has to divide their size. Remember, if it doesn't malloc(n*sizeof(T)) for some type T is broken as an array of n elements of type T. The C standard is painted into a corner that unsigned char can't be aligned because it's always legit to treat memory as an array of char (either kind).

So the following program will either fail at the assert(.) or execute successfully. On all known platforms it will execute successfully as there are no known platforms that would choose to internally pad the given data structure - whether you specify packed or not.

But why do this:

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>


typedef struct {
   unsigned char type;
   unsigned char length;
   unsigned char value;
} TLV;

static TLV dummy;

int main(void) {

    //There's no standard way to verify this at compile time.
    //NB: If you stick with packing or leave all the members of TLV the same type
    //Then this is almost certainly NOT an issue.
    //However the cast of test implicitly assumes the following is the case.
    //Here's a run-time check of a static constraint.
    assert(offsetof(TLV,value)==(sizeof(dummy.type)+sizeof(dummy.length)));

    unsigned char test[5] = {(unsigned char)'T', 0x03, 0x01, 0x02, 0x03};

    TLV* tlv=(TLV*)test;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",(&tlv->value)[i]);
    }

    (&tlv->value)[0]=253;
    (&tlv->value)[1]=254;
    (&tlv->value)[2]=255;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",(&tlv->value)[i]);
    }

    return EXIT_SUCCESS;
}

When you can do this (C99 onwards I'm told) and have no crappy alignment problems:

#include <stdlib.h>
#include <stdio.h>

typedef struct {
   unsigned char type;
   unsigned char length;
   unsigned char value[];//Variable length member.
} TLV;

int main(void) {

    TLV* tlv=malloc(sizeof(TLV)+3*sizeof(unsigned char));

    tlv->type='T';
    tlv->length=3;
    tlv->value[0]=1;
    tlv->value[1]=2;
    tlv->value[2]=3;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",tlv->value[i]);
    }

    tlv->value[0]=253;
    tlv->value[1]=254;
    tlv->value[2]=255;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",tlv->value[i]);
    }

    free(tlv);

    return EXIT_SUCCESS;
}

Notice that there is no compliant guaranteed way of allocating such things statically because there is no compliant guaranteed way of dictating the layout (and in particular size) of the structure so you cannot know how much space to allocate in the char array.

You can (of course) blend the solutions but if you pack the structure you are likely to break alignment of value (if something needing alignment) and if you don't risk the compiler internally padding TLV. That internal padding is unlikely in the current guise but actually very likely if you upgraded length to type of size_t - which is the natural 'full' answer.

The current length limit of 255 (on almost all platforms) is frankly stingy. It felt mean in 1993 writing in Turbo Pascal. In 2015 it's piffling. At least implement length as `unsigned int' unless you know such a tight ceiling is sufficient.

14 Comments

OP states he is dereferencing the struct pointer Yet when I try to access the value array I get a segmentation fault, even when I try to access the first element of the value memory address (which should be the first element in the array after the length). Where value is the element in the struct and accessed via pointer. Therefore causing you know what ( Undfined b. ).
@2501 He hasn't provided his code but I believe my first specimen answer resolves that problem. Are you saying I've missed the problem?
Your second example is correct, but the first isn't, specifically these problems: Breaking strict aliasing rules here: printf("%u\n",(&tlv->value)[i]): ISO/IEC 9899:201x 6.5,p7 and 6.2.7,p2. And in the same line going out of bounds of the object when the value of i is more than zero: 6.5.6, p8. In both cases Standard clearly says, quoting: the behavior is undefined
@2501 I'll check but I believe you're over interpreting aliasing. Firstly you're not over array bounds. test was allocated as an array. It is definitely OK to perform pointer arithmetic in there. Second there is no dereferencing of aliases. At all, What are the two dereferences of aliases that you think are problematic? You can't pick on access and declare it aliasing. You need two. test is never dereferenced. It's fine.
test is never dereferenced. test is dereferenced right here printf("%u\n",(&tlv->value)[i]); Dereferencing the struct pointer itself causes undefined behavior, since: ISO/IEC 9899:201x 6.2.5, p28 states: Pointers to other types need not have the same representation or alignment requirements. which means that struct pointer cannot represent a char pointer. Even if you argue that &tlv->value doesn't dereference the pointer, the pointer itself isn't valid.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.