0

I'm having this kind of input data.

<html>...... <!-- OK -->

I only want to extract the data before the comment sign <!--. This is my code:

char *parse_data(char *input) {
    char *parsed_data = malloc(strlen(input) * sizeof(char));
    sscanf(input, "%s<!--%*s", parsed_data);
    return parsed_data;
}

However, it doesn't seem to return the expected result. I can't figure out why is that so.

Could anyone explain me the proper way to extract this kind of data and the behavior of 'sscanf()`.

Thank you!

1 Answer 1

4

The "%s" format specifier will not treat "<!--" as a single delimiter, or any of the individual characters as a delimiter (which would not be the correct behaviour anyway). Only whitespace is considered a delimiter. Scan sets are available in sscanf() but they take a collection of individual characters rather that a sequence of characters representing a single delimiter. This means that everything in input before the first whitespace character will be assigned to parsed_data.

You could use strstr() instead:

const char* comment_start = strstr(input, "<!--");
char* result = 0;
if (comment_start)
{
    result = malloc(comment_start - input + 1);
    memcpy(result, input, comment_start - input);
    result[comment_start - input] = 0;
}

Note that sizeof(char) is guaranteed to be 1 so can be omitted as part of the malloc() argument calculation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.