1

I'm actually trying to fprintf an unsigned char array in its hexadecimal representation in a file.

To do that, I use this code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    unsigned char tab[...] = "...";

    FILE* Output = NULL;
    Output = fopen("Output.txt", "w+");

    tabLength = sizeof(tab);
    for ( unsigned int i = 0; i < tabLength; i++ )
        fprintf(Output, "%2X", tab[i]);

    fclose(Output);
}

With a small array, no problem, but as it tends towards big arrays (200M element in my case), it gets a lot longer :(

If some of you have an option to do the job in a faster way, I would be glad :)

EDIT: tabLength = strlen(tab) --> tabLength = sizeof(tab);

5
  • Allocate another character array of length 2*tabLength. sprintf all your hexs into it. Write it into the file with one single fwrite. Commented Aug 4, 2017 at 1:25
  • @DYZ It's a good idea but the RAM... not so good. If i slice the tab in several parts which ones I sprintf in the second array to fwrite it, will my method be efficient or will it lose its interest ? Also, will fwrite write the hexa in plaintext like "2E" because i want it to do so. Commented Aug 4, 2017 at 1:43
  • @DavidBowling The array is not compiled like that, it's "mallocated" and then the 200MBs are taken from an input file of 200MB or more Commented Aug 4, 2017 at 1:45
  • Yes, you can speed up your program by merging any number of printfs. Merge as many as you can afford. fwrite will honestly copy the content of the buffer to the disk. Commented Aug 4, 2017 at 1:47
  • 1
    fprintf is buffered (as well as fwrite), so it does not make much sense to do yet another buffering on the top of this one. Do you know where the time is spent? you need to profile. it could be your array copying outside of this or could just be the int to ascii conversion. Commented Aug 4, 2017 at 1:53

2 Answers 2

3

For a quicker way to dump your buffer in hex into the file, use a hand-coded chunk based approach:

  • no fprintf() overhead
  • far fewer library function calls, and correspondingly fewer locks.
  • write in chunks of 4K (or a higher power of 2) to favor page alignment, giving a chance to fwrite to bypass the buffering phase.

Here is the code:

#define CHUNK 2048

void dumphex(const unsigned char *a, size_t size, FILE *fp) {
    char buf[CHUNK * 2];
    const char *xdigits = "0123456789ABCDEF";
    size_t i, j;

    for (; size >= CHUNK; size -= CHUNK, a += CHUNK) {
        for (i = j = 0; i < CHUNK; i++, j += 2) {
            unsigned char c = a[i];
            buf[j + 0] = xdigits[c >> 4];
            buf[j + 1] = xdigits[c & 15];
        }
        fwrite(buf, 2, CHUNK, fp);
    }
    for (i = j = 0; i < size; i++, j += 2) {
        unsigned char c = a[i];
        buf[j + 0] = xdigits[c >> 4];
        buf[j + 1] = xdigits[c & 15];
    }
    fwrite(buf, 2, size, fp);
}
Sign up to request clarification or add additional context in comments.

Comments

2

Code runs the lenth of tab twice with the strlen() call and the for loop.

tabLength = strlen(tab); // waste
for ( unsigned int i = 0; i < tabLength; i++ )
    fprintf(Output, "%2X", tab[i]);

Instead run the length only once.

for ( unsigned int i = 0; tab[i]; i++ )
    fprintf(Output, "%2X", tab[i]);

Further, I believe OP's premise is ill-formed. Using tabLength = strlen(tab); to find the length implies the unsigned char array is a string. Instead I suspect the true length of the unsigned char array may be found in other ways.


Typically, repeated calls to fprintf() to print 2 characters is better done in groups of say, 16, 64, 4096, etc. In which case @DYZ good idea prevails. Print to a local buffer in chunks and then print the buffer.

for (unsigned int i = 0; i < tabLength; i++ )
    fprintf(Output, "%2X", tab[i]);

Minor: A more typical formant is "%02X". It depends on if you want leading spaces or 0 with values less than 16.

5 Comments

I've always been fond of buffers size 10240, which would allow these to be written in batches of 5120 ...
@StevenK.Mariner Most interesting. My experience favors powers of 2.
Yep, tabis not a string, strlen is a mistake and so your for is not valid anymore, i'll edit my post.
@chux, I agree with you, actually. The 10K buffer dates back to early DOS days when I was balancing between powers of 2 and a buffer size that didn't kill RAM on a old PC XT but gave me sufficient performance gains to matter. Whatever testing I did in those days had me land on 10K, and it sort of became my generic answer to all buffering things (hence the #define USRANS 10240 humor from yesterday's issue). I would definitely recommend a power of 2 in modern computing.
@TomClabault Bad SO etiquette to change a fundamental premise of the post once once answers arrive. It makes the post a moving target and unclear.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.