2

I have a string, and in it I need to find a substring and replace it. The one to be found and the one that'll replace it are of different length. My code, partially:

char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";

char *sub_s1 = strcasestr(source_str, pattern);

printf("sub_s1: %s\r\n", sub_s1);
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption

char *new_str = (char *)malloc(strlen(source_str) - strlen(pattern) + strlen(new_sub_s) + 1);

strcat(new_str, '\0');
strcat(new_str, "??? part before pattern ???");
strcat(new_str, new_sub_s);
strcat(new_str, "??? part after pattern ???");
  1. Why do I have memory corruption?

  2. How do I effective extract and replace pattern with new_sub_s?

3
  • Better look here first geeksforgeeks.org/… Commented Aug 18, 2020 at 15:21
  • @user3121023 I have this strcat(new_str, '\0'); Commented Aug 18, 2020 at 15:27
  • 1
    @kosmosu two problems, strcat() assumes new_str is null-terminated, which as user3121023 pointed out, isn't the case with malloc(). Secondly, '\0' is a char, not a char*. Commented Aug 18, 2020 at 15:28

3 Answers 3

3

There are multiple problems in your code:

  • you do not test if sub_s1 was found in the string. What if there is no match?
  • printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); passes a difference of pointers for %s that expects a string. The behavior is undefined.
  • strcat(new_str, '\0'); has undefined behavior because the destination string is uninitialized and you pass a null pointer as the string to concatenate. strcat expects a string pointer as its second argument, not a char, and '\0' is a character constant with type int (in C) and value 0, which the compiler will convert to a null pointer, with or without a warning. You probably meant to write *new_str = '\0';

You cannot compose the new string with strcat as posted: because the string before the match is not a C string, it is a fragment of a C string. You should instead determine the lengths of the different parts of the source string and use memcpy to copy fragments with explicit lengths.

Here is an example:

char *patch_string(const char *source_str, const char *pattern, const char *replacement) {
    char *match = strcasestr(source_str, pattern);
    if (match != NULL) {
        size_t len = strlen(source_str);
        size_t n1 = match - source_str;   // # bytes before the match
        size_t n2 = strlen(pattern);      // # bytes in the pattern string
        size_t n3 = strlen(replacement);  // # bytes in the replacement string
        size_t n4 = len - n1 - n2;        // # bytes after the pattern in the source string
        char *result = malloc(n1 + n3 + n4 + 1);
        if (result != NULL) {
            // copy the initial portion
            memcpy(result, source_str, n1);
            // copy the replacement string
            memcpy(result + n1, replacement, n3);
            // copy the trailing bytes, including the null terminator
            memcpy(result + n1 + n3, match + n2, n4 + 1);
        }
        return result;
    } else {
        return strdup(source_str);  // always return an allocated string
    }
}

Note that the above code assumes that the match in the source string has be same length as the pattern string (in the example, strings "cccc" an "CcCc" have the same length). Given that strcasestr is expected to perform a case independent search, which is confirmed by the example strings in the question, it might be possible that this assumption fail, for example if the encoding of upper and lower case letters have a different length, or if accents are matched by strcasestr as would be expected in French: "é" and "E" should match but have a different length when encoded in UTF-8. If strcasestr has this advanced behavior, it is not possible to determine the length of the matched portion of the source string without a more elaborate API.

Sign up to request clarification or add additional context in comments.

15 Comments

You cannot compose the new string with strcat as posted. why not?
@kosmosu: I expanded the answer with more explanations.
Don't know whether this was just an oversight, but strcat(new_str, '\0'); shouldn't even compile.
@PatrickRoberts: '\0' is a character constant with type int and value 0, hence converts to a null character pointer if the prototype for strcat() is in scope. This is utmostly confusing, and might be flagged by the compiler with a warning. I shall be more explicit about this error.
@chqrlie you're right, I forgot about that particular footgun...
|
1
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption

You're taking the difference of two pointers, and printing it as though it was a pointer to a string. In practice, on your machine, this probably calculates a meaningless number and interprets it as a memory address. Since this is a small number, when interpreted as an address, on your system, this probably points to unmapped memory, so your program crashes. Depending on the platform, on the compiler, on optimization settings, on what else there is in your program, and on the phase of the Moon, anything could happen. It's undefined behavior.

Any half-decent compiler would tell you that there's a type mismatch between the %s directive and the argument. Turn those warnings on. For example, with GCC:

gcc -Wall -Wextra -Werror -O my_program.c
char *new_str = (char *)malloc(…);
strcat(new_str, '\0');
strcat(new_str, "…");

The first call to strcat attempts to append '\0'. This is a character, not a string. It happens that since this is the character 0, and C doesn't distinguish between characters and numbers, this is just a weird way of writing the integer 0. And any integer constant with the value 0 is a valid way of writing a null pointer constant. So strcat(new_str, '\0') is equivalent to strcat(new_str, NULL) which will probably crash due to attempting to dereference the null pointer. Depending on the compiler optimizations, it's possible that the compiler will think that this block of code is never executed, since it's attempting to dereference a null pointer, and this is undefined behavior: as far as the compiler is concerned, this can't happen. This is a case where you can plausibly expect that the undefined behavior causes the compiler to do something that looks preposterous, but makes perfect sense from the way the compiler sees the program.

Even if you'd written strcat(new_str, "\0") as you probably intended, that would be pointless. Note that "\0" is a pointless way of writing "": there's always a null terminator at the end of a string literal¹. And appending an empty string to a string wouldn't change it.

And there's another problem with the strcat calls. At this point, the content of new_str is not initialized. But strcat (if called correctly, even for strcat(new_str, ""), if the compiler doesn't optimize this away) will explore this uninitialized memory and look for the first null byte. Because the memory is uninitialized, there's no guarantee that there is a null byte in the allocated memory, so strcat may attempt to read from an unmapped address when it runs out of buffer, or it may corrupt whatever. Or it may make demons fly out of your nose: once again it's undefined behavior.

Before you do anything with the newly allocated memory area, make it contain the empty string: set the first character to 0. And before that, check that malloc succeeded. It will always succeed in your toy program, but not in the real world.

char *new_str = malloc(…);
if (new_str == NULL) {
    return NULL; // or whatever you want to do to handle the error
}
new_str[0] = 0;
strcat(new_str, …);

¹ The only time there isn't a null pointer at the end of a "…" is when you use this to initialize an array and the characters that are spelled out fill the whole array without leaving room for a null terminator.

2 Comments

And there's another problem with the strcat calls. At this point, the content of new_str is not initialized. --> should have I used calloc instead?
@kosmosu You can, yes. Using malloc and initializing the first byte is enough if you know what you're doing. Using calloc is marginally slower, but that rarely matters, and it's safer. I work on security-critical code and we forbid malloc, we only allow calloc. Other projects, especially ones that allocate very large blocks of memory, have different rules.
1

snprintf can be used to calculate the memory needed and then print the string to the allocated pointer.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main ( void) {
    char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
    char *pattern = "cccc";
    char *new_sub_s = "mmmmm4343afdsafd";

    char *sub_s1 = strcasestr(source_str, pattern);
    int span = (int)( sub_s1 - source_str);
    char *tail = sub_s1 + strlen ( pattern);

    size_t size = snprintf ( NULL, 0, "%.*s%s%s", span, source_str, new_sub_s, tail);

    char *new_str = malloc( size + 1);

    snprintf ( new_str, size, "%.*s%s%s", span, source_str, new_sub_s, tail);

    printf ( "%s\n", new_str);

    free ( new_str);

    return 0;
}

1 Comment

GNU systems also have asprintf() which is an even more direct solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.