9

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.

char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);

Output is 6513249 1633837824 6513249

Which one is correct? What is going wrong?

2
  • The first way is similar to doing a union and as the answers below say relies on the endianness of processors. Commented Oct 11, 2013 at 17:30
  • 5
    Use printf("%08X %08X %08X\n", a, b, c); and notice how all the same bytes are there, but in different order. Commented Oct 11, 2013 at 17:35

6 Answers 6

15

It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.

To put this into pictures, this is the source array:

   a      b      c      \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 |  <---- bytes in memory
+------+------+------+------+

When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.

Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:

union {
    char str[4];
    unsigned int ui;
} u;

strcpy(u.str, "abc");
printf("%u\n", u.ui);
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. The picture makes it very clear. The answer I wanted was the one with bytes placed manually. BTW, You made a typo- 0x64 in array picture instead of 0x63.
6

Neither of the first two is correct.

The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:

unsigned int a; memcpy(&a, &str, sizeof a);

(Presuming the size of an unsigned int and the size of str are the same.)

The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:

unsigned int b = (unsigned int) str[0] << 24 | …;

This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.

Comments

1
unsigned int a = *(unsigned int*)str;

This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.

Comments

1

You said you want to copy byte-by-byte.

That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.

It just needs some tweaking:

 char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
   c[i] = str[i];
}
printf("%d\n", a);

Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:

void 
changeEndian32(void * data)
{
    uint8_t * cp = (uint8_t *) data;
    union 
    {
        uint32_t word;
        uint8_t bytes[4];
    }temp;

    temp.bytes[0] = cp[3];
    temp.bytes[1] = cp[2];
    temp.bytes[2] = cp[1];
    temp.bytes[3] = cp[0];
    *((uint32_t *)data) = temp.word;
}

2 Comments

For union members, results are implementation-dependent if something is stored as one type and extracted as another.
@AlterMann - I didn't know that. I'm interested to learn more. Do you have a reference? My C is almost always 'implementation dependent' so I'm glad to have these things pointed out.
1

Both are correct in a way:

  • Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.

  • Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.

What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).

4 Comments

Both of the first two can cause crashes. This should be mentioned in any answer. Neither is correct.
@Eric Postpischil: 1st way: alignment is a completely different issue that has nothing to do the OPs original question. In very many cases (i.e. on many hardware platforms) alignment doesn't matter at all and code like this is completely ok. 2nd way: this will definitely not result in a crash on any circumstances (No matter if int is large enough for the value shifted by 24 bits)
Alignment does matter and does have to do with the OP’s original question: Aliasing a char array as an int is not guaranteed to conform to alignment requirements and may crash in some C implementations. The fact that is does not crash on many platforms does not make it okay because it does not erase the fact that it does crash on some.
The second way may overflow in str[0] << 24. str[0] is a char, so it is promoted to int (except possibly in perverse C implementations where an int is not wider than a char). This is a signed integer. Then shifting it by 24 bits may overflow the range of an int. E.g., if str[0] is 128, then str[0] << 24 would be 2147483648, but the largest value representable by a 32-bit signed int is 2147483647. The behavior of overflow with signed integers is not defined by the C standard. The program may crash or produce incorrect results.
0

If your using CVI (National Instruments) compiler you can use the function Scan to do this:

unsigned int a;

For big endian: Scan(str,"%1i[b4uzi1o3210]>%i",&a);

For little endian: Scan(str,"%1i[b4uzi1o0123]>%i",&a);

The o modifier specifies the byte order. i inside the square brackets indicates where to start in the str array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.