9

I’m much less experienced in C than I am in higher-level languages. At Cisco, we use C, and I sometimes run into something that would be easy to do in Java or Python, but very difficult to do in C. Now is one of those times.

I have a dynamically-allocated array of unsigned integers which I need to convert to a comma-separated string for logging. While the integers are not likely to be very large, they could conceptually be anywhere from 0 to 4,294,967,295 In Python, that’s one short line.

my_str = ','.join(my_list)

How elegantly can people do this in C? I came up with a way, but it’s gross. If anyone knows a nice way to do it, please enlighten me.

11
  • 2
    Can you post the way that you did it in C? Commented Nov 17, 2009 at 0:31
  • 2
    Wait, join() is a member function of the string class!? Commented Nov 17, 2009 at 0:35
  • 3
    Bipedal: this way we only have to implement join once (okay, once per string class, so twice) instead of hundreds of times for every possible type of iterable object. Read ','.join(sequence) as "comma-separated sequence" and ''.join(sequence) as "empty-separated sequence" Commented Nov 17, 2009 at 0:47
  • 1
    @BipedalShark - docs.python.org/library/stdtypes.html#str.join Commented Nov 17, 2009 at 1:10
  • 1
    Bipedal: Yes, actually, it is. Python uses duck-typing and any object can be an iterator by implementing only next and __iter__ methods. Python does not require inheritance to specify interface. Commented Nov 17, 2009 at 5:02

14 Answers 14

9

Code now tested and builds under gcc.

In contrast to other answers, does not mandate C99.

The real problem here is not knowing the length of the string you're going to need. Getting a number is as easy as sprintf("%u", *num) using num to walk your array of ints, but how much space are you going to need? To avoid overrunning a buffer, you have to keep track of a lot of integers.

size_t join_integers(const unsigned int *num, size_t num_len, char *buf, size_t buf_len) {
    size_t i;
    unsigned int written = 0;

    for(i = 0; i < num_len; i++) {
        written += snprintf(buf + written, buf_len - written, (i != 0 ? ", %u" : "%u"),
            *(num + i));
        if(written == buf_len)
            break;
    }

    return written;
}

Notice, that I keep track of how much of the buffer I have used and use snprintf so I don't overrun the end. snprintf will tack on a \0, but since I'm using buf + written I will start at the \0 of the previous snprintf.

In use:

int main() {
    size_t foo;
    char buf[512];

    unsigned int numbers[] = { 10, 20, 30, 40, 1024 };

    foo = join_integers(numbers, 5, buf, 512);
    printf("returned %u\n", foo);
    printf("numbers: %s\n", buf);
}

Outputs:

returned 20
numbers: 10, 20, 30, 40, 1024

Forcing the limiting to kick in, instead of overrunning:

char buf[15];    
foo = join_integers(numbers, 5, buf, 14);
buf[14] = '\0';

Outputs, expectedly:

returned 14
numbers: 10, 20, 30, 4
Sign up to request clarification or add additional context in comments.

8 Comments

Why do you say it does not mandate C99? snprintf is from C99.
Technically, snprintf() is a C99 function and not in C89. However, it is widely enough available not to be a major issue. Also, use sizeof(buf) rather than 512, even in test code.
I suppose one could argue for some platforms that snprintf() is in SUSV2.
@Jonathan Leffler: then why don't just use sizeof(buf) inside the function?
@Spidey: you can't use 'sizeof(buf)' inside a function because C passes pointers, so the size given would be the size of the pointer (probably 4 or 8) and not the size of the array it points at.
|
2

You can actually use a library like Glib, which contains functions like

gchar* g_strjoin (const gchar *separator, ...);

Joins a number of strings together to form one long string, with the optional separator inserted between each of them. The returned string should be freed with g_free().

(you still need to use g_snprintf(), possibly with g_printf_string_upper_bound() to ensure space)

Comments

2

Are you guys getting paid by the line? :-)


f() is declared with a char * parameter for prototyping purposes, just change char -> int. I interpreted the question as requiring a string as output rather than just code to write to a file.

#define PRINT(s, l, x, i) snprintf((s), (l), "%s%d", (i) ? ",":"", (x)[i]);

char *f(size_t len, char *x) {
  size_t  i, j = 0, k;

  for(i = 0; i < len; ++i)
      j += PRINT(NULL, 0, x, i);
  char *result = malloc(k = ++j);
  for(*result = i = j = 0; i < len; ++i)
      j += PRINT(result + j, k - j, x, i);
  return result;
}

Here is a test framework:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

// put f() here

int main(int ac, char **av) {
    for(int i = 1; i < ac; ++i) { 
        printf("%s\n", f(strlen(av[i]), av[i]));
    }
    return 0;
}

1 Comment

If you are willing to use a statically allocated temporary buffer just get rid of the first loop...
2

What about this?

char *join_int_list(const unsigned int *list, size_t n_items)
{
     enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
     char *space = malloc(SIZEOF_INT_AS_STR * n_items);
     if (space != 0)
     {
         size_t i;
         char *pad = "";
         char *dst = space;
         char *end = space + SIZEOF_INT_AS_STR * n_items;
         for (i = 0; i < n_items; i++)
         {
              snprintf(dst, end - dst, "%s%u", pad, list[i]);
              pad = ",";
              dst += strlen(dst);
         }
         space = realloc(space, dst - space + 1);
     }
     return(space);
}

It is the responsibility of the caller to release the returned pointer - and to check that it is not null before using it. The 'realloc()' releases extra space if the amount allocated was enough too large to make it worth while. This code is blithely assuming that the values are indeed 32-bit unsigned integers; if they can be bigger, then the enum needs appropriate adjustment.

Tested code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *join_int_list(const unsigned int *list, size_t n_items)
{
    enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
    char *space = malloc(SIZEOF_INT_AS_STR * n_items);
    if (space != 0)
    {
        size_t i;
        char *pad = "";
        char *dst = space;
        char *end = space + SIZEOF_INT_AS_STR * n_items;
        for (i = 0; i < n_items; i++)
        {
            snprintf(dst, end - dst, "%s%u", pad, list[i]);
            pad = ",";
            dst += strlen(dst);
        }
        space = realloc(space, dst - space + 1);
    }
    return(space);
}

int main(void)
{
    static unsigned int array[]= { 1, 2, 3, 49, 4294967295U, 0, 332233 };
    char *str = join_int_list(array, sizeof(array)/sizeof(array[0]));
    printf("join: %s\n", str);
    free(str);
    return(0);
}

Checked with valgrind - seems to be OK.


Discussion of converting INT_MAX or UINT_MAX to string:

You can use sizeof("," STRINGIZE(INT_MAX)) instead of hard coding it. The stringize macro is a common cpp tool that can be defined as #define STRINGIZE_(v) #v and #define STRINGIZE(v) STRINGIZE_(v). – R. Pate

@R Pate: good idea - yes, you could do that quite effectively. Actually, there are two interesting ideas in there: the use of string concatenation with sizeof() (parentheses required for clarity - but string concatenation happens early enough that the compiler isn't worried) and the use of a stringize operation on INT_MAX. – Jonathan Leffler

Using a stringize operation on INT_MAX is not a good idea - it just has to be a "constant expression", not necessarily a sequence of digits. It could be defined as ((1<<32)-1), or even something whacky like __int_max, as long as the compiler lets you use it anywhere you can use a constant expression. – caf

And @caf is right. Consider this code:

#include <limits.h>
#include <stdio.h>

#undef INT_MAX
#define INT_MAX (INT_MIN-1 - 100 + 100)

#define QUOTER(x)   #x
#define STRINGIZER(x)   QUOTER(x)

enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
enum { SIZEOF_INT_AS_STR_1 = sizeof(STRINGIZER(INT_MAX) ",")-1 };

int main(void)
{
    printf("size = %d = %d\n", SIZEOF_INT_AS_STR, SIZEOF_INT_AS_STR_1);
    printf("INT_MAX  = %d\n", INT_MAX);
    printf("UINT_MAX = %u\n", UINT_MAX);
    return(0);
}

This doesn't even compile on MacOS X 10.5.8 with GCC 4.0.1 - because the identifier INT_MAX is not defined. A preliminary version of the code that did not print INT_MAX or UINT_MAX worked; it showed that the value of SIZEOF_INT_AS_STR_1 was 31 - so @caf was correct. Adding the double-checking on the values of INT_MAX and UINT_MAX then failed to compile, which did surprise me. A look at the output from gcc -E reveals why:

enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
enum { SIZEOF_INT_AS_STR_1 = sizeof("((-INT_MAX - 1)-1 - 100 + 100)" ",")-1 };

int main(void)
{
 printf("size = %d = %d\n", SIZEOF_INT_AS_STR, SIZEOF_INT_AS_STR_1);
 printf("INT_MAX  = %d\n", ((-INT_MAX - 1)-1 - 100 + 100));
 printf("UINT_MAX = %u\n", (((-INT_MAX - 1)-1 - 100 + 100) * 2U + 1U));
 return(0);
}

As predicted, the string for SIZEOF_IN_AS_STR_1 is not a digit string at all. The pre-processor can evaluate the expression (as much as it needs to), but does not have to produce a digit string.

The expansion of INT_MAX turns out to be in terms of INT_MIN, and INT_MIN is, in turn, defined in terms of INT_MAX, so when the rewritten INT_MAX macro is evaluated, the 'recursive expansion' is prevented by the C pre-processor rules of operation, and INT_MAX appears in the pre-processed output - to the confusion of all.

So, there are multiple reasons why it the superficially attractive idea turns out to be a bad idea.

5 Comments

Note that SIZEOF_INT_AS_STR is only valid for systems with 32-bit ints.
Yup: added a comment to that effect, but the range quoted by the asker effectively enforces that, so it meets the specification.
You can use sizeof("," STRINGIZE(INT_MAX)) instead of hard coding it. The stringize macro is a common cpp tool that can be defined as #define STRINGIZE_(v) #v and #define STRINGIZE(v) STRINGIZE_(v).
@R Pate: good idea - yes, you could do that quite effectively. Actually, there are two interesting ideas in there: the use of string concatenation with sizeof() (parentheses required for clarity - but string concatenation happens early enough that the compiler isn't worried) and the use of a stringize operation on INT_MAX.
Using a stringize operation on INT_MAX is not a good idea - it just has to be a "constant expression", not necessarily a sequence of digits. It could be defined as ((1<<32)-1), or even something whacky like __int_max, as long as the compiler lets you use it anywhere you can use a constant expression.
1
unsigned *a; /* your input a[N] */
unsigned i,N;
char *b,*m;
b=m=malloc(1+N*11); /* for each of N numbers: 10 digits plus comma (plus end of string) */
for (i=0;i<N;++i)
  b+=sprintf(b,"%u,",a[i]);
if (N>0) b[-1]=0; /* delete last trailing comma */
/* now use m */
free(m);

pretty, right? :)

Comments

1
char buf [11 * sizeof (my_list)];
for (int n = 0, int j = 0;  j < sizeof (my_list) / sizeof (my_list [0]);  ++j)
    n += sprintf (&buf [n], "%s%u",   (j > 0) ? "," : "",  my_list [j]);

3 Comments

Need to be careful with sprintf here - you don't want a buffer overflow.
Is sprintf safe to call in this way? To make the code comparable to the python posted, you should show the initialization of buf.
Of course one would allocate buf[] to be big enough. 11 times the number of numbers in my_list should do it.
1
#include <stdio.h>
#include <stdlib.h>

/* My approach is to count the length of the string required. And do a single alloc.
     Sure you can allocate more, but I don't know for how long this data will be retained.
*/ 

#define LEN(a) (sizeof a / sizeof *a)

int main(void) {

    unsigned a[] = {1, 23, 45, 523, 544};
    int i, str_len=0, t_written=0;
    char tmp[11]; /* enough to fit the biggest unsigned int */

    for(i = 0; i < LEN(a); i++) 
        str_len += sprintf(tmp, "%d", a[i]);

    /* total: we need LEN(a) - 1 more for the ',' and + 1 for '\0' */
    str_len += LEN(a);
    char *str = malloc(str_len); 
    if (!str) 
        return 0;

    if (LEN(a) > 1) {
        t_written += sprintf(str+t_written, "%d", a[0]);
        for(i = 1; i < LEN(a); i++)
            t_written += sprintf(str+t_written, ",%d", a[i]);
    } else if (LEN(a) == 1) 
        t_written += sprintf(str+t_written, "%d", a[0]);

    printf("%s\n", str);

    free(str);
    return 0;
}

5 Comments

#define LEN(a) (sizeof a / sizeof *a)
How much do you need to change to convert this main program into a reusable function?
I cringe every time I see people storing string or array lengths in int types. What's so terrible about size_t?
Chill out Chris. Everyone knows that sizeof returns size_t. For this demonstration, an int is just fine.
Chris one more thing, I checked your profile, followed your website link, just to find out horrible C code like this: char ext[16], lang = malloc(16*sizeof(char)); while(--start) { / go backwards, find the last period in the filename */ sizeof(char) ? really?lol. How about you check the return value of lang before using it as well? And what's up with the formatting?
1

You guys and your needless special cases to take care of the trailing comma... It's cheaper just to destroy the last comma then to have a conditional check every time the loop runs.

:)

#include <stdio.h>

char* toStr(int arr[], unsigned int arrSize, char buff[])
{
    if (arr && arrSize && buff)
    {
        int* currInt = arr;
        char* currStr = buff;
        while (currInt < (arr + arrSize))
        {
            currStr += sprintf(currStr, "%d,", *currInt++);
        }
        *--currStr = '\0';
    }
    return buff;
}

int main()
{
    int arr[] = {1234, 421, -125, 15251, 15251, 52};
    char buff[1000];

    printf("Arr is:%s\n", toStr(arr, 6, buff));    
}

Assumes buff is big enough allocate it to be (length of largest int + 2) * arrSize). Inspired my memcpy :)

Edit I realized I had a brainfart earlier and could have just incremented by the return value of sprintf instead of storing temp. Apparently other answers do this as well, edited my answer to remove the 2 needless lines.

Edit2 Looks like wrang-wrang beat me to the punch! His answer is pretty much identical to mine and was submitted earlier. I humbly suggest giving him a + 1.

4 Comments

*--currStr = '\0' needs an explanatory comment to anybody who has to maintain the code after you. (i == 0 ? :) does not.
I guess its a matter of opinion, but I +1'd yours for actually being safer about the string len.
This breaks if arrSize == 0. Also you need to s/size/arrSize/, at least.
Still a problem: if arrSize is 0 then you miss setting buff[0] to '\0'.
0

Assuming that when you mention "for logging" you mean writing to a log file, then your solution could look something like this (pseudocoded):

for (int x in array) {
    fprintf(log, "%d", x);
    if (! last element)
        fputc(log, ',');
}

Comments

0

Personally, for simplicity, and probably speed too, I would malloc a large buffer, with space for an array the size of "4,294,967,295" and ", " for each element. It's not space efficient during the creation of the list though!

Then I'd sprintf the ints into there, and append ", " to all elements

At last I'd realloc the pointer to have no more space than required. (size = strlen)

sprintf: On success, the total number of characters written is returned. This count does not include the additional null-character automatically appended at the end of the string.

That's how you keep track of where strcpy to in the string. :)

Hope that helps! :)

And if you just want to print them out, see the other replies instead. (for-loop and printf)

1 Comment

Actually, size = strlen-1, since you don't want the trailing ", ". :)
0

Unfortunately there will always be three cases:

  • Empty list (no commas, no items)
  • One item (no commas, one item)
  • Two or more items (n-1 commas, n items)

The join method hides this complexity for you, which is why it is so nice.

In C, I'd do:

for (i = 0; i < len; i++)
{
    if (i > 0)   /* You do need this separate check, unfortunately. */
        output(",");
    output(item[i]);
}

Where output is however you're appending to the string. It could be as simple as strcat on a pre-allocated buffer, or a printf to some stream (like a memory stream I learned about today in Creating a FILE * stream that results in a string :-).

If you were annoyed about doing that check every time for all i >= 1, you could do:

if (i > 0)
{
    output(item[0]);
    for (i = 1; i < len; i++)
    {
        output(",");
        output(item[i]);
    }
}

1 Comment

Use Duff's Device (see my answer) for how to avoid both the check-per-loop and duplicating output logic.
0

If you want it to a file, Steven Schlansker's answer is fine.

If you want to put it in a string, though, things get more complicated. You can use sprintf, but you need to be careful that you don't run out of room in your string. If you have a C99-compatble snprintf (Linux, BSD, not Windows), the following (untested, uncompiled) code should work:

char *buf = malloc(1024); /* start with 1024 chars */
size_t len = 1024;
int pos = 0;
int rv;
int i;
for (i = 0; i < n; i++) {
    rv = snprintf(buf+pos, len - pos, "%s%d", i = 0 ? "" : ",", my_list[i]);
    if (rv < len - pos) {
        /* it fit */
        pos += rv;
    } else {
        len *= 2;
        buf = realloc(buf, len);
        if (!buf) abort();
        i--; /* decrement i to repeat the last iteration of the loop */
    }
}
return buf;

The caller must then free buf.

2 Comments

Why start with 1024, and why allocate it yourself? Allow the caller to specify the size of his buffer, and keep it his responsibility. Work within what you're given.
Agreed. If caller frees, it makes sense to have the caller allocate too.
0
void join(int arr[], int len, char* sep, char* result){
    if(len==0){
        *result='\0';
    } else {
        itoa(arr[0],result,10);
        if(len > 1){
            strcat(result,sep);
            join(arr+1,len-1,sep,result+strlen(result));
        }
    }
}

Comments

0

Here's a linear solution that allocates an exponentially increasing buffer (fewer calls to realloc, if that matters) for the caller to free. Test script included.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void ensure(bool pred, char *msg, char *file, int line) {
    if (!pred) {
        fprintf(stderr, "%s:%d: %s", file, line, msg);
        exit(1);
    }
}

char *arr_to_s(int len, int *arr, char *sep) {
    size_t sep_len = strlen(sep);
    int result_capacity = 16 + sep_len;
    int result_len = 0;
    char *result = malloc(result_capacity);
    ensure(result, "malloc", __FILE__, __LINE__);
    result[0] = '\0';
  
    for (int i = 0; i < len; i++) {
        char num[16+sep_len];
        int previous_len = result_len;
        result_len += sprintf(num, i < len - 1 ? "%d%s" : "%d", arr[i], sep);
      
        if (result_len >= result_capacity) {
            result_capacity <<= 1;
            result = realloc(result, result_capacity);
            ensure(result, "realloc", __FILE__, __LINE__);
        }
      
        strcat(result + previous_len, num);
    }
  
    return result;
}

void run_basic_tests(void) {
    int tests[][4] = {
        {0},
        {0, 1},
        {0, 1, 2},
        {0, 42, 2147483647, -2147483648},
    };
    
    for (int i = 0; i < 4; i++) {
        char *s = arr_to_s(i + 1, tests[i], ", ");
        printf("[%s]\n", s);
        free(s);
    }
}

void run_intensive_tests(int n) {
    srand(42);

    for (int i = 0; i < n; i++) {
        int len = rand() % 2000;
        int test[len];

        printf("[");

        for (int j = 0; j < len; j++) {
            test[j] = rand() % 2000000000 - 1000000000;
            printf(j < len - 1 ? "%d," : "%d", test[j]);
        }

        puts("]");
        
        char *s = arr_to_s(len, test, ",");
        printf("[%s]\n", s);
        free(s);
    }
}

int main(void) {
    //run_basic_tests();
    run_intensive_tests(10000);
    return 0;
}

Test runner:

#!/usr/bin/env bash

gcc -std=c99 -pedantic -Wall \
    -Wno-missing-braces -Wextra -Wno-missing-field-initializers -Wformat=2 \
    -Wswitch-default -Wswitch-enum -Wcast-align -Wpointer-arith \
    -Wbad-function-cast -Wstrict-overflow=5 -Wstrict-prototypes -Winline \
    -Wundef -Wnested-externs -Wcast-qual -Wshadow -Wunreachable-code \
    -Wlogical-op -Wfloat-equal -Wstrict-aliasing=2 -Wredundant-decls \
    -Wold-style-definition -Werror \
    -ggdb3 \
    -O0 \
    -fno-omit-frame-pointer -ffloat-store -fno-common -fstrict-aliasing \
    -lm \
    -o arr_to_s.out \
    arr_to_s.c

./arr_to_s.out > arr_to_s_test_out.txt
cat arr_to_s_test_out.txt | awk 'NR % 2 == 1' > arr_to_s_test_expected.txt
cat arr_to_s_test_out.txt | awk 'NR % 2 == 0' > arr_to_s_test_actual.txt
diff arr_to_s_test_expected.txt arr_to_s_test_actual.txt

Valgrind:

==573== HEAP SUMMARY:
==573==     in use at exit: 0 bytes in 0 blocks
==573==   total heap usage: 103,340 allocs, 103,340 frees, 308,215,716 bytes allocated
==573==
==573== All heap blocks were freed -- no leaks are possible

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.