0

The following code combines two bytes into one 16 bit integer.

unsigned char byteOne = 0b00000010; // 2
unsigned char byteTwo = 0b00000011; // 3

uint16_t i = 0b0000000000000000;
i = (byteOne << 8) | byteTwo; //515

I'm trying to understand WHY this code works.

If we break this down and just focus on one byte, byteOne; This is an 8 bit value equal to 00000010. So, left-shifting this by 8 bits should always yield 00000000 (as the bits shifted off the end are lost), right? This seems to be the case with the following code:

uint8_t i = (byteOne << 8); // equal to 0, always, no matter what 8 bit value is assigned to byteOne

But if this way of thinking was correct, then

uint16_t i = (byteOne << 8) | byteTwo;

Should be equivalent to

uint16_t i = 0 | byteTwo; // Because 0b00000010 << 8 == 0b00000000

Or just

uint16_t i = byteTwo; // Because 0b00000000 | 0b00000011 == 0b00000011

But they're not equivalent and this is throwing me off. Is byteOne being cast/converted into a 16 bit int before the shifting operation? That would explain what's going on here as then

0b0000000000000010 << 8 == 0b0000001000000000 // 512

If byteOne isn't being converted into a 16 bit int before the shifting operation, then please explain why the (byteOne << 8) isn't evaluating to 0 when assigning to a 16 bit integer.

2
  • 3
    integral promotion to int from small types. Commented Dec 30, 2020 at 0:25
  • 1
    Have you tried auto i = (byteOne << 8); and seen what type the compiler chooses to give i (hence is the type of the expression byteOne << 8)? Commented Dec 30, 2020 at 0:25

2 Answers 2

3

Yes--when you do almost any sort of operation on any value smaller than an int the first thing that happens is that the value is promoted to int (or, in some cases, unsigned int).

In case you really care about the details that apply here (§[conv.prom]/1):

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (6.8.4) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

Then the operation happens on the promoted value (§[expr.shift]/1):

The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Jerry. So what happens with uint8_t i = (byteOne << 8)? Is it not promoted then because I'm assigning the value into an 8 bit type? And what if I had 8 bytes that I wanted to combine into a data type larger than int, like uint64_t, would the compiler know to convert each of the 8 bytes to 64 bit types before performing the shift operations?
In the first case, it gets promoted to int, then the result truncated back to uint8_t. When you're dealing with something larger than int, you generally want to convert the values to the correct type before you to the operation.
0

As the shift does not hapen 'inplace' (byteOne = byteOne << 8), the compiler needs to use a register for the intermediate result. In the line i = (byteOne << 8) | byteTwo; the size of the register for the intermediate is not specified (for example with a cast). Only the final result has to be uint16_t. So for the intermediate result it's up to the compiler.

When your code snipped is feed to a compiler you could get the following assembler code:

;// copy the two bytes and the word in the stack
movb    $2, -1(%rbp) ;// uint8_t byteOne = 2
movb    $3, -2(%rbp) ;// uint8_t byteTwo = 3
movw    $0, -4(%rbp) ;// uint16_t i = 0
;// move the byteOne into the acumulate register(32bit)
movzbl  -1(%rbp), %eax ;// uint32_t temp = byteOne
;// shift left by 8
sall    $8, %eax ;// temp = temp << 8
;// move temp to different register
movl    %eax, %edx ;// uint32_t temp2 = temp
;// move the byteTwo into the acumulate register(32bit)
movzbl  -2(%rbp), %eax ;// temp = byteTwo
;// logical or of temp2 and temp
orl     %edx, %eax ;// temp2 = temp2 | temp
;// copy back to stack location of i
movw    %ax, -4(%rbp) ;// i = (uint16_t)temp2

%eax is a 32-bit register, therefore no overflow. The cast to uint16_t is done actively by MOVWord movw %ax, -4(%rbp).

I'm not sure how the compiler desides which register size to use for these intermediate results, but I suspect that it depends on your system and compiler.

The compiler on my system g++.exe (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 7.2.0 seams to use 32bit registers as standard. The following code used 32 bit registers, too and therefore did not return the expected result:

unsigned char byteOne = 0b00000010; // 2
unsigned char byteTwo = 0b00000011; // 3
uint16_t i = 0b0000000000000000;
i = ((byteOne << 32) | byteTwo << 24) >> 24; // 3

The same 32 bit %eax register is used, therefore the overflow ocured. So if the intermediate result is not exceeding 32 bit the result is as expected like with:

unsigned char byteOne = 0b00000010; // 2
unsigned char byteTwo = 0b00000011; // 3 
uint16_t i = 0b0000000000000000;
i = ((byteOne << 16) | byteTwo << 8) >> 8; // 515

The compiler for an 8bit Microcontroller will most certainly give a different result.

6 Comments

This seems to be more confirmation of the OP's result than an explanation of why that result is mandated by C++ / of why that C++ source resulted in this particular batch of assembly. (If C++ had called for a different result, different assembly would have been produced.)
Of course it confirms the OP's result, but it is an explanation. Did you read it completely? i = (byteOne << 8) | byteTwo; produces intermediate results with no specified bit width, so it's up to the compiler to choose the right registers from the available ones in the system. C++ code is just text, there is no direct machine code representation, it's up to the compiler to interpret it.
I'm more into the hardware- and low level programming. Therefore I like explanations which express the cause of something more, than something like: C++ language allows integer promotion and it may be converted to int.
"i = (byteOne << 8) | byteTwo; produces intermediate results with no specified bit width" -- this is false. The bit width of the intermediate result is specified by the C++ standard to be the same as that of an int.
"I like explanations which express the cause of something" -- that's fine, but when you look at the machine instructions produced by a C++ compiler, you are looking at an effect, not a cause. The compiler produced those instructions because they produce the result called for by the C++ standard. For example: "%eax is a 32-bit register, therefore no overflow" neglects to consider that the compiler would have simulated an overflow if overflow was required by the standard. The cause is the C++ standard. The effect is that the machine instructions do not force an overflow.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.