How to print an unsigned char array in hexadecimal to a file?

Question

I'm actually trying to fprintf an unsigned char array in its hexadecimal representation in a file.

To do that, I use this code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    unsigned char tab[...] = "...";

    FILE* Output = NULL;
    Output = fopen("Output.txt", "w+");

    tabLength = sizeof(tab);
    for ( unsigned int i = 0; i < tabLength; i++ )
        fprintf(Output, "%2X", tab[i]);

    fclose(Output);
}

With a small array, no problem, but as it tends towards big arrays (200M element in my case), it gets a lot longer :(

If some of you have an option to do the job in a faster way, I would be glad :)

EDIT: tabLength = strlen(tab) --> tabLength = sizeof(tab);

Allocate another character array of length 2*tabLength. sprintf all your hexs into it. Write it into the file with one single fwrite. — DYZ
– DYZ, Commented Aug 4, 2017 at 1:25
@DYZ It's a good idea but the RAM... not so good. If i slice the tab in several parts which ones I sprintf in the second array to fwrite it, will my method be efficient or will it lose its interest ? Also, will fwrite write the hexa in plaintext like "2E" because i want it to do so. — Tom Clabault
– Tom Clabault, Commented Aug 4, 2017 at 1:43
@DavidBowling The array is not compiled like that, it's "mallocated" and then the 200MBs are taken from an input file of 200MB or more — Tom Clabault
– Tom Clabault, Commented Aug 4, 2017 at 1:45
Yes, you can speed up your program by merging any number of printfs. Merge as many as you can afford. fwrite will honestly copy the content of the buffer to the disk. — DYZ
– DYZ, Commented Aug 4, 2017 at 1:47
fprintf is buffered (as well as fwrite), so it does not make much sense to do yet another buffering on the top of this one. Do you know where the time is spent? you need to profile. it could be your array copying outside of this or could just be the int to ascii conversion. — Serge
– Serge, Commented Aug 4, 2017 at 1:53

chqrlie · Accepted Answer · 2017-08-04 07:53:49Z

For a quicker way to dump your buffer in hex into the file, use a hand-coded chunk based approach:

no fprintf() overhead
far fewer library function calls, and correspondingly fewer locks.
write in chunks of 4K (or a higher power of 2) to favor page alignment, giving a chance to fwrite to bypass the buffering phase.

Here is the code:

#define CHUNK 2048

void dumphex(const unsigned char *a, size_t size, FILE *fp) {
    char buf[CHUNK * 2];
    const char *xdigits = "0123456789ABCDEF";
    size_t i, j;

    for (; size >= CHUNK; size -= CHUNK, a += CHUNK) {
        for (i = j = 0; i < CHUNK; i++, j += 2) {
            unsigned char c = a[i];
            buf[j + 0] = xdigits[c >> 4];
            buf[j + 1] = xdigits[c & 15];
        }
        fwrite(buf, 2, CHUNK, fp);
    }
    for (i = j = 0; i < size; i++, j += 2) {
        unsigned char c = a[i];
        buf[j + 0] = xdigits[c >> 4];
        buf[j + 1] = xdigits[c & 15];
    }
    fwrite(buf, 2, size, fp);
}

chux · Accepted Answer · 2017-08-04 03:01:15Z

2

Code runs the lenth of tab twice with the strlen() call and the for loop.

tabLength = strlen(tab); // waste
for ( unsigned int i = 0; i < tabLength; i++ )
    fprintf(Output, "%2X", tab[i]);

Instead run the length only once.

for ( unsigned int i = 0; tab[i]; i++ )
    fprintf(Output, "%2X", tab[i]);

Further, I believe OP's premise is ill-formed. Using tabLength = strlen(tab); to find the length implies the unsigned char array is a string. Instead I suspect the true length of the unsigned char array may be found in other ways.

Typically, repeated calls to fprintf() to print 2 characters is better done in groups of say, 16, 64, 4096, etc. In which case @DYZ good idea prevails. Print to a local buffer in chunks and then print the buffer.

for (unsigned int i = 0; i < tabLength; i++ )
    fprintf(Output, "%2X", tab[i]);

Minor: A more typical formant is "%02X". It depends on if you want leading spaces or 0 with values less than 16.

edited Aug 4, 2017 at 3:01

answered Aug 4, 2017 at 1:50

chux

158k17 gold badges160 silver badges311 bronze badges

5 Comments

Steven K. Mariner Over a year ago

I've always been fond of buffers size 10240, which would allow these to be written in batches of 5120 ...

chux Over a year ago

@StevenK.Mariner Most interesting. My experience favors powers of 2.

Tom Clabault Over a year ago

Yep, tabis not a string, strlen is a mistake and so your for is not valid anymore, i'll edit my post.

Steven K. Mariner Over a year ago

@chux, I agree with you, actually. The 10K buffer dates back to early DOS days when I was balancing between powers of 2 and a buffer size that didn't kill RAM on a old PC XT but gave me sufficient performance gains to matter. Whatever testing I did in those days had me land on 10K, and it sort of became my generic answer to all buffering things (hence the #define USRANS 10240 humor from yesterday's issue). I would definitely recommend a power of 2 in modern computing.

chux Over a year ago

@TomClabault Bad SO etiquette to change a fundamental premise of the post once once answers arrive. It makes the post a moving target and unclear.

Collectives™ on Stack Overflow

How to print an unsigned char array in hexadecimal to a file?

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related