4

I have a problem that is probably fairly common and likely has a beautiful hack that I am not aware of. I would greatly appreciate it if someone would enlighten me!

I am using C's sscanf() function to parse input and the format is "%d %d %d %s %d %s %d ..." where the first two %d are random ID integers (insignificant) for the string and the third is a count of the number of %d %s combinations to follow.

For instance, "12 34 2 3 yes 2 no" could be a string, where 12 and 34 random are ID's (unimportant to the problem) and 2 specifies the two combinations following of '3 yes' and '2 no'. The 3 preceding 'yes' specifies the length of the string following, and the same is true for the 'no' with a 2 before it. Where we can have a variable number of these combinations following and we want to catch them all with the sscanf.

Does anyone know of any way to do this with sscanf?

Thanks a lot!

5
  • If the C++ tag is accurate, you probably want to use a stringstream instead. It'll make this rather easier. Commented Mar 31, 2013 at 2:03
  • Yes, I corrected it. Sorry. Commented Mar 31, 2013 at 2:07
  • @TimHaggard: Is this a C question or a C++ question? They are different languages. Commented Mar 31, 2013 at 2:57
  • 1
    @nneonneo: The scanf function is part of the C Standard and also part of the C++ Standard. It is appropriate to tag scanf() questions with either or both languages. Commented Mar 31, 2013 at 3:25
  • @Ben But depending on which language is used, a good answer would be rather different, which would warrant two different questions/answers. Combining the two doesn't really help imo. Commented Mar 31, 2013 at 4:45

4 Answers 4

4

Just parse the string in two (or more) passes. This uses the %n format specifier to write the number of bytes processed, so we know where to pick up in subsequent passes:

int a, b, n, pos;
const char *buf = "12 34 2 3 yes 2 no";

assert(sscanf(buf, "%d %d %d %n", &a, &b, &n, &pos) == 3);
for(int i=0; i<n; i++) {
    int cur;

    int x;
    char y[20];

    assert(sscanf(buf+pos, "%d %19s %n", &x, y, &cur) == 2);
    printf("%d %s\n", x, y);
    pos += cur;
}

outputs

3 yes
2 no
Sign up to request clarification or add additional context in comments.

9 Comments

I'd wrap the sscanf's around assert, to make sure I know when the first returns something other than 3 and the second returns something other than 2 respectively. Aside from that, +1 because this is my favourite answer in the realm of C programming.
Apparently the effect of %n on the return value is somewhat undefined (ew). So I will add asserts for >=3/>=2.
As an aside, this may in fact be the only time I've ever found %n useful (outside of exploiting format string vulnerabilities to write arbitrary data to arbitrary stack addresses).
No, it's well defined. The C standard says "Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function." The %n directive will always execute providing the previous directives succeed, which is why I suggested assert. Ideally, one would report an error instead of raising an assertion failure. However, including error reporting code in such an example takes the focus away from the direction of the solution. assert will hopefully bring that focus back, when errors do occur.
%n is also useful when you need the number of decimals in an integer. Many people will use code such as assert(scanf("%d", &val) == 1); do { decimal_count++; val /= 10; } while (val > 0); (erroneously, for negative values) when all that is necessary is assert(scanf("%d%n", &val, &decimal_count) == 1);.
|
1

There's no convenient way to do this with just sscanf. You'd need to dynamically generate the format string itself, before passing it to sscanf.

You might want to consider writing a specialized parsing routine for this where you call sscanf in a loop, or more preferably (since you specify the C++ tag) using an std::istringstream in a loop.

1 Comment

I meant sscanf. Sorry about the confusion!
0

First of all, ssprintf() is used to generate string, not parse it. You should use sscanf. I do not know hwo to finish it in one sscanf(). But you can do it in a loop as:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
    int choice_id, count;
    char choice[20],id_1[20],id_2[20], count_str[20], choice_id_str[20];
    int index;
    char *input = "12 34 2 3 yes 2 no";

    sscanf(input, "%s %s %s", id_1, id_2, count_str);
    input += strlen(id_1)+strlen(id_2)+strlen(count_str)+2;
    count = atoi(count_str);
    for(index = 0; index< count; ++index){
        sscanf(input, " %s %s", choice_id_str, choice );
        choice_id = atoi(choice_id_str);
        // Process or store the record
        printf("%d: %s\n",choice_id, choice);
        input += strlen(choice_id_str) + strlen(choice) + 2;
    }
    return 0;
}

Compile with gcc (GCC) 4.1.2, and run with Linux. The output is:

-bash-3.2$ ./a.out
3: yes
2: no

4 Comments

the input here is the to-parse string, such as "12 34 2 3 yes 2 no"
It is not clear how to move the pointer. scanf may consume an arbitrary amount of the input string.
That's incorrect...a space in sscanf could match an arbitrary amount of whitespace (not just one character).
I did not notice that. But I think it still works well. I updated my code into full exampel.
0

Is there a maximum value for the number of %d %s pairs that follow the initial "%d %d %d" header? There must be since you're putting the values somewhere, perhaps in a struct { int i; char s[j]; } a[n]; with appropriate j and n sizes.

(BTW, your example uses "%d %d %d %s %d %s %d ..." when the description indicates it should be "%d %d %d %d %s %d %s %d %s ...", 3 decimals, followed by decimal/string pairs)

If there is a maximum just create a maximized template and test that the return code from sscanf, which should indicate the number of input items successfully matched and assigned, is your 3rd value minus the 3 header items. If the return code isn't correct when compared to the 3rd int, report a malformed line.

I once made something like this a function so the a[n] was automatic ion the stack and then the function allocated a linked list from the heap for the items sscanf-ed and returned a pointer to the first item.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.