1

I'm doing an assignment for school to swap the bytes in an unsigned long, and return the swapped unsigned long. ex. 0x12345678 -> 0x34127856. I figured I'll make a char array, use sprintf to insert the long into a char array, and then do the swapping, stepping through the array. I'm pretty familiar with c++, but C seems a little more low level. I researched a few topics on sprintf, and I tried to make an array, but I'm not sure why it's not working.

unsigned long swap_bytes(unsigned long n) {
char new[64];

sprintf(new, "%l", n);

printf("Char array is now: %s\n", new);
}
4
  • you are in trouble :-) I am going to try and post an answer with an explanation of WHY you're in trouble Commented Sep 9, 2017 at 17:27
  • Use asprintf or snprintf Commented Sep 9, 2017 at 19:39
  • compile with gcc -Wall -Wextra -g Commented Sep 9, 2017 at 20:18
  • And using new for a variable name is not really a good idea, given it's a keyword in many languages. Commented Sep 11, 2017 at 0:40

3 Answers 3

2

TLDR; The correct approach is at the bottom

Preamble

Issues with what you're doing

First off using sprintf for byte swapping is the wrong approach because

  1. it is a MUCH MUCH slower process than using the mathematical properties of bit operations to perform the byte swapping.

  2. A byte is not a digit in a number. (a wrong assumption that you've made in your approach)

  3. It's even more painful when you don't know the size of your integer (is it 32-bits, 64 bits or what)

The correct approach

Use bit manipulation to swap the bytes (see way way below)

The absolutely incorrect implementation with wrong output (because we're ignoring issue #2 above)

There are many technical reasons why sprintf is much slower but suffice it to say that it's so because moving contents of memory around is a slow operation, and of course more data you're moving around the slower it gets:

In your case, by changing a number (which sits in one manipulatable 'word' (think of it as a cell)) into its human readable string-equivalence you are doing two things:

  1. You are converting (let's assume a 64-bit CPU) a single number represented by 8 bytes in a single CPU cell (officially a register) into a human equivalence string and putting it in RAM (memory). Now, each character in the string now takes up at least a byte: So a 16 digit number takes up 16 bytes (rather than 8)

  2. You are then moving these characters around using memory operations (which are slow compared do doing something directly on CPU, by factor of a 1000)

  3. Then you're converting the characters back to integers, which is a long and tedious operation

However, since that's the solution that you came up with let's first look at it.

The really wrong code with a really wrong answer

Starting (somewhat) with your code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned long swap_bytes(unsigned long n) {
     int i, l;
     char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
     sprintf(new, "%lu", n); /* you forgot the `u` here */

     printf("The number is: %s\n", new); /* well it shows up :) */

     l = strlen(new);
      for(i = 0; i < l; i+=4) {
          char tmp[2];
          tmp[0] = new[i+2]; /* get next two characters */
          tmp[1] = new[i+3];  
          new[i+2] = new[i];
          new[i+3] = new[i+1];
          new[i] = tmp[0];
          new[i+1] = tmp[1];  
     }

     return strtoul(new, NULL, 10); /* convert new back */
}

/* testing swap byte */
int main() {
      /* seems to work: */
      printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
      /* how about 432? (err not) */
      printf("Swapping 432: %lu\n", swap_bytes(432));
}

As you can see the above is not really byte swapping but character swapping. And any attempt to try and "fix" the above code is nonsensical. For example,how do we deal with odd number of digits?

Well, I suppose we can pad odd digit counts with a zero:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
     int i, l;
     char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
     sprintf(new, "%lu", n); /* you forgot the `u` here */

     printf("The number is: %s\n", new); /* well it shows up :) */

     l = strlen(new);
     if(l % 2 == 1) { /* check if l is odd */
         printf("adding a pad to make n even digit count");
         sprintf(new, "0%lu", n);
         l++; /* length has increased */          
     }

     for(i = 0; i < l; i+=4) {
          char tmp[2];
          tmp[0] = new[i+2]; /* get next two characters */
          tmp[1] = new[i+3];  
          new[i+2] = new[i];
          new[i+3] = new[i+1];
          new[i] = tmp[0];
          new[i+1] = tmp[1];  
     }

     return strtoul(new, NULL, 10); /* convert new back */
}

/* testing swap byte */
int main() {
      /* seems to work: */
      printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
      printf("Swapping 432: %lu\n", swap_bytes(432));
      /* how about 432516? (err not) */
      printf("Swapping 432: %lu\n", swap_bytes(432));
}

Now we run into an issue with numbers which are not divisible by 4... Do we pad them with zeros on the right or the left or the middle? err NOT REALLY.

In any event this entire approach is wrong because we're not swapping bytes anyhow, we're swapping characters.

Now what?

So you may be asking

what the heck is my assignment talking about?

Well numbers are represented as bytes in memory, and what the assignment is asking for is for you to get that representation and swap it.

So for example, if we took a number like 12345678 it's actually stored as some sequence of bytes (1 byte == 8 bits). So let's look at the normal math way of representing 12345678 (base 10) in bits (base 2) and bytes (base 8):

(12345678)10 = (101111000110000101001110)2

Splitting the binary bits into groups of 4 for visual ease gives:

(12345678)10 = (1011 1100 0110 0001 0100 1110)2

But 4 bits are equal to 1 hex number (0, 1, 2, 3... 9, A, B...F), so we can convert the bits into nibbles (4-bit hex numbers) easily:

(12345678)10 = 1011 | 1100 | 0110 | 0001 | 0100 | 1110 (12345678)10 = B | C | 6 | 1 | 4 | E

But each byte (8-bits) is two nibbles (4-bits) so if we squish this a bit:

(12345678)10 = (BC 61 4E)16

So 12345678 is actually representable in 3 bytes;

However CPUs have specific sizes for integers, usually these are multiples of 2 and divisible by 4. This is so because of a variety of reasons that are beyond the scope of this discussion, suffice it to say that you will get things like 16-bit, 32-bit, 64-bit, 128-bit etc... And most often the CPU of a particular bit-size (say a 64bit CPU) will be able to manipulate unsigned integers representable in that bit-size directly without having to store parts of the number in RAM.

Slight Digression

So let's say we have a 32-bit CPU, and somewhere at byte number α in RAM. The CPU could store the number 12345678 as:

>    00     BC     61       4E    
>    ↑ α    ↑ α+1  ↑ α+2    ↑ α+3

(Figure 1)

Here the most significant part of the number, is sitting at the lowest memory address index α

Or the CPU could store it differently, where the least significant part of the number is sitting at the lowest memory.

>    4E     61     BC       00    
>    ↑ α    ↑ α+1  ↑ α+2    ↑ α+3

(Figure 2)

The way a CPU stores a number is called Endianness (of the CPU). Where, if the most significant part is on the left then it's called Big-Endian CPU (Figure 1), or Little-Endian if it stores it as in (Figure 2)

Getting the correct answer (the wrong way)

Now that we have an idea of how things may be stored, let's try and pull this out still using sprintf.

We're going to use a couple of tricks here:

  1. we'll convert the numbers to hexadecimal and then pad the number to 8 bytes
  2. we'll use printf's (therefore sprintf) format string capability that if we want to use a variable to specify the width of an argument then we can use a * after the % sign like so:

       printf("%*d", width, num);
    
  3. If we set our format string to %0*x we get a hex number that's zero padded in output automatically, so:

     sprintf(new, "%0*llx", sizeof(n), n); 
    

Our program then becomes:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


unsigned long swap_bytes(unsigned long n) {
     int i, l;
     char new[64] = ""; 
     sprintf(new, "%0*llx", sizeof(n), n); 

     printf("The number is: %s\n", new);

     l = strlen(new);

     for(i = 0; i < l; i+=4) {
          char tmp[2];
          tmp[0] = new[i+2]; /* get next two characters */
          tmp[1] = new[i+3];  
          new[i+2] = new[i];
          new[i+3] = new[i+1];
          new[i] = tmp[0];
          new[i+1] = tmp[1];  
     }

     return strtoul(new, NULL, 16); /* convert new back */
}


/* testing swap byte */
int main() {
      printf("size of unsigned long is %ld\n", sizeof(unsigned long));
      printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
                /* how about 123456? */
      printf("Swapping 123456: %llx\n", swap_bytes(123456));

      printf("Swapping 123456: %llx\n", swap_bytes(98899));
}

The output would look something like:

size of unsigned long is 8
The number is: 00bc614e
Swapping 12345678: bc004e61
The number is: 0001e240
Swapping 123456: 10040e2
The number is: 00018253
Swapping 123456: 1005382

Obviously we can change our outputs by using %ld and print the base 10 versions of the numbers, rather than base 16 as is happening above. I'll leave that to you.

Now let's do it the right way

This is however rather terrible, since byte swapping can be done much faster without ever doing the integer to string and string to integer conversion.

Let's see how that's done:

The rather explicit way

Before we go on, just a bit on bit shifting in C:

If I have a number, say 6 (=1102) and I shift all the bits to the left by 1 I would get 12 (11002) (we simply shifted everything to the left adding zeros on the right as needed)

This is written in C as 6 << 1.

A right shift is similar and can be expressed in C with >> so if I have a number say 240 = (11110000)2 and I right-shift it 4 times I would get 15 = (1111)2 this is expressed as 240 >> 3

Now we have unsigned long integers which are (in my case at least) 64 bits long, or 8 bytes long.

Let's say my number is 12345678 which is (00 00 00 00 00 bc 61 4e)16 in hex at 8 bytes long. If I want to get the value of byte number 3 I can extract it by taking the number 0xFF (1111 1111) all bits of a byte set to 1 and left shifting it until i get to the byte 3 (so left shift 3*8 = 24 times) performing a bitwise and with the number and then right shifting the results to get rid of the zeros. This is what it looks like:

 0xFF << (3 * 8) = 0xFF0000 & 0000 0000 00bc 614e = 0000 0000 00bc 0000

Now right shift:

 0xFF0000 & 0000 0000 00bc 0000 >> (3 * 8) = bc

Another (better) way to do it would be to right shift first and then perform bitwise and with 0xFF to drop all higher bits:

0000 0000 00bc 614e >> 24 = 0000 0000 0000 00bc & 0xFF = bc

We will use the second way, and make a macro using #define now we can add the bytes back at the right location by right shifting each kth byte k+1 times and each k+1st byte k times.

Here is a sample implementation of this:

#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)

unsigned long swap_bytes(unsigned long n)
{
    unsigned long long rv = 0ULL;
    int k;
    printf("number is %016llx\n", n);
    for(k =0 ; k < sizeof(n); k+=2) {
        printf("swapping bytes %d[%016lx] and %d[%016lx]\n", k, GET_BYTE(n, k),
            k+1, GET_BYTE(n, k+1));

        rv += GET_BYTE(n, k) << 8*(k+1);
        rv += GET_BYTE(n, k+1) << 8*k;
    }
    return rv;
}

/* testing swap byte */
int main() {
      printf("size of unsigned long is: %ld\n", sizeof(unsigned long));
      printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
                /* how about 123456? */
      printf("Swapping 123456: %llx\n", swap_bytes(123456));

      printf("Swapping 123456: %llx\n", swap_bytes(98899));
}

But this can be done so much more efficiently. I leave it here for now. We'll come back to using bit blitting and xor swapping later.

Update with GET_BYTE as a function instead of a macro:

#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)

Just for fun we also use a shift operator for multiplying by 8. You can note that left shifting a number by 1 is like multiplying it by 2 (makes sense since in binary 2 is 10 and multiplying by 10 adds a zero to the end and therefore is the same as shifting something left by one space) So multiplying by 8 (1000)2 is like shifting something three spaces over or basically tacking on 3 zeros (overflows notwithstanding):

unsigned long __inline__ get_byte(const unsigned long n, const unsigned char idx) {
       return ((n >> (idx << 3)) & 0xFFUL);
}

Now the really really fun and correct way to do this

Okay so a fast way to swap integers around is to realize that if we have two integers x, and y we can use properties of xor function to swap their values. The basic algorithm is this:

X := X XOR Y
Y := Y XOR X
X := X XOR Y

Now we know that a char is one byte in C. So we can force the compiler to treat the 8 byte integer as a sequence of 1-byte chars (hehe it's a bit of a mind bender considering everything I said about not doing it in sprintf) but this is different. You have to just think about it a bit.

We'll take the memory address of our integer, cast it to a char pointer (char *) and treat the result as an array of chars. Then we'll use the xor function property above to swap the two consecutive array values.

To do this I am going to use a macro (although we could use a function) but using a function will make the code uglier.

One thing you'll note is that there is the use of ?: in XORSWAP below. That's like an if-then-else in C but with expressions rather than statements, so basically (conditional_expression) ? (value_if_true) : (value_if_false) means if conditional_expression is non-zero the result will be value_if_true, otherwise it will be value_if_false. AND it's important not to xor a value with itself because you will always get 0 as a result and clobber the content. So we use the conditional to check if the addresses of the values we are changing are DIFFERENT from each other. If the addresses are the same (&a == &b) we simply return the value at the address (&a == &b) ? a : (otherwise_do_xor)

So let's do it:

#include <stdio.h>

/* this macro swaps any two non floating C values that are at 
 * DIFFERENT memory addresses. That's the entire &a == &b ? a : ... business 
 */
#define XORSWAP(a, b)   ((&(a) == &(b)) ? (a) : ((a)^=(b),(b)^=(a),(a)^=(b)))

unsigned long swap_bytes(const unsigned long n) {
    unsigned long rv = n; /* we are not messing with original value */
    int k;
    for(k = 0; k < sizeof(rv); k+=2) {
        /* swap k'th byte with k+1st byte */
        XORSWAP(((char *)&rv)[k], ((char *)&rv)[k+1]);
    }
    return rv;
}

int main()
{
    printf("swapped: %lx", swap_bytes(12345678));
    return 0;
}

Here endeth the lesson. I hope that you will go through all the examples. If you have any more questions just ask in comments and I'll try to elaborate.

Sign up to request clarification or add additional context in comments.

5 Comments

I've never seen macros like that, I usually worked with c++. What's the advantage over just making a function inside swap_bytes? Also, thank you very much. This is a great explanation of what's going on.
There's no particular advantage. You can think of a macro as manually typing the content of the macro as is wherever it's put in the code (so that happens BEFORE compiling). There's no particular advantage in this case, so you could create function (and even declare it inline so it expands to code) and get all good goodies like type-checking etc. However there are times when macros become powerful and advantageous in creating clean code, this is not really one of them. I'll add inline function version of get_byte
What if the data passed into swap_bytes was a char pointer or cstring, and not an unsigned long? How would we get say byte 3 and 4 from the data in the char pointer, and store it in another variable (say, short int) for some other kind of testing?
That's a different use case from OP. Again depending on the situation you would approach it differently. For example to look at the bytes unsigned short you could simply do a cast on the memory pointer as an unsigned short. You can do an assignment again using a Typecast. If you have any questions about that post it and I'll see if I can reply
Thanks, I've been working with pointers and bits a lot more and I figured it out. Appreciate the time you put into this it really helped a lot!
2
unsigned long swap_bytes(unsigned long n) { 
    char new[64];
    sprintf(new, "%lu", n);
    printf("Char array is now: %s\n", new);
}

You need to use %lu - long unsigned, for format in sprintf(), the compiler should also given you conversion lacks type warning because of this.

1 Comment

This works - thanks. I'm new to C, and the book we got for the class is from 1980's, so it doesn't cover anything bigger than ints.
0

To get it to print you need to use %lu (for unsigned)

It doesn't seem like you attempted the swap, could I see your try?

1 Comment

I haven't attempted it yet, I wanted the array to work first. I'll post it in a bit when I get it working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.