2

I have the following code: function get_unlimited_input allocates a new string if NULL was passed, otherwise it just appends characters to the existing string. In the end it truncates excess bytes. (DEFAULT_BUFFER_SIZE was set to 5 to test case of many reallocations)

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define DEFAULT_BUFFER_SIZE 5

char *get_unlimited_input(char *buffer) {
    size_t current_size;
    if (buffer == NULL) {
        buffer = malloc(DEFAULT_BUFFER_SIZE * sizeof(char));
        current_size = DEFAULT_BUFFER_SIZE;
    } else {
        current_size = strlen(buffer) + DEFAULT_BUFFER_SIZE;
    }
    char *cursor = buffer + current_size - DEFAULT_BUFFER_SIZE;
    for (;;) {
        int current = getchar();
        *cursor = (char)current;
        cursor++;
        if (current == '\n' || current == EOF)
            break;
        if (cursor >= buffer + current_size) {
            current_size += DEFAULT_BUFFER_SIZE;
            buffer = realloc(buffer, current_size);
            cursor = buffer + current_size - DEFAULT_BUFFER_SIZE;
        }
    }
    *cursor = '\0';
    buffer = realloc(buffer, cursor - buffer);
    return buffer;
}

int main() {
    printf(">");
    char *buffer = get_unlimited_input(NULL);
    printf(">");
    get_unlimited_input(buffer);
}

In most cases it works just fine, but if I pass 117 characters first, and then 12 it crashes:

>.....................................................................................................................
>............
realloc(): invalid next size
Aborted (core dumped)
python3 -c "print('.'*117+'\n'+'.'*12)" | ./_buffer
realloc(): invalid next size
Aborted (core dumped)

What is the problem?

2
  • 4
    I guess realloc(buffer, cursor - buffer); -> realloc(buffer, cursor - buffer + 1);. Compile your program with -fsanitize=address and check for stack overflows. Commented Jan 25, 2021 at 19:36
  • OT: the function: realloc() can fail. Therefore, always assigned the returned pointer to a temp variable, check that temp variable for NULL. If NULL then handle error, else assign temp to the target variable. If this check is not made, then the result is a unrecoverable memory leak Commented Jan 25, 2021 at 23:46

4 Answers 4

3

Among other problems, you trim all the extra space from the buffer before you return it. But if you pass in a buffer to the function, you assume it still has the extra space. Thus you must not pass the buffer returned from the function back to the function. But you do exactly that.

    } else {
        current_size = strlen(buffer) + DEFAULT_BUFFER_SIZE;
    }
...
    buffer = realloc(buffer, cursor - buffer);

Also, as pointed out by KamilCuk, you don't leave space for the terminator in the returned string, so calling strlen on it isn't safe.

You should document what the requirements on the input to the function are and what requirements the output of the function is guaranteed to satisfy. That makes spotting mistakes like this much easier.

As soon as you see, "if a buffer is passed to this function, it must have extra space" and "the buffer returned from this function does not have any extra space", it's obvious that you cannot pass the returned buffer back to the function because the output guarantees do not meet the input requirements.

Sign up to request clarification or add additional context in comments.

Comments

2

The basic problem is that when you call get_unlimited_input with a non-null pointer (preexisting buffer from an earlier call), it assumes the buffer size is strlen(buffer) + DEFAULT_BUFFER_SIZE, which is wrong. The previous call will actually have realloced the buffer to match the length of the string not including the terminating NUL (which means that the terminating NUL itself may well be lost.)

You can fix these by incrementing by incrementing cursor after storing a NUL and before reallocing (so the realloc will be large enough), and then changing the calculation of current_size when passed a non-null pointer to be strlen(buffer) + 1

Another problem is that when you get an EOF from getchar, you are then casting that EOF to a char and storing it in the buffer. A EOF is not a valid character -- the whole point of having getchar return an int instead of a char is so that it can signal EOF as distinct from any character. Thus on EOF you are storing some random garbage character (or non-character) in the buffer, which might just show up as garbage, or might cause a crash or error on output (depending on the system).

Comments

1

There are multiple problems in your code, causing the heap to become corrupted as the diagnostic indicates:

  • your assumption about the currently allocated size is incorrect: current_size = strlen(buffer) + DEFAULT_BUFFER_SIZE; is too optimistic. Since you reallocate the buffer to cursor - buffer bytes before returning it, there is no slack at the end of the string.

  • you test for '\n' and EOF after storing the byte to the array. This may be the intended behavior for the newline, but it is incorrect for EOF, which is not a character.

  • reallocating the buffer with buffer = realloc(buffer, cursor - buffer); is incorrect too: cursor points to the null terminator, hence you should use a size of cursor + 1 - buffer to keep the null terminator inside the allocated block.

Here is a modified version:

#include <stdio.h>
#include <stdlib.h>

#define DEFAULT_BUFFER_SIZE  16  /* use address alignment as incremental size */

char *get_unlimited_input(char *buffer) {
    size_t current_size, pos;
    char *p;

    if (buffer == NULL) {
        pos = 0;
        current_size = DEFAULT_BUFFER_SIZE;
        buffer = malloc(DEFAULT_BUFFER_SIZE);
        if (buffer == NULL)
            return NULL;
    } else {
        pos = strlen(buffer);
        current_size = pos + 1;
    }
    for (;;) {
        int c = getchar();
        if (c == EOF || c == '\0')
            break;
        if (pos + 1 == current_size) {
            // reallocate the buffer
            current_size += DEFAULT_BUFFER_SIZE;
            p = realloc(buffer, current_size);
            if (p == NULL)
                break;
            buffer = p;
        }
        buffer[pos++] = (char)c;
        if (c == '\n')
            break;
    }
    buffer[pos] = '\0';
    p = realloc(buffer, pos + 1);
    return (p != NULL) ? p : buffer;
}

int main() {
    printf("> ");
    char *buffer = get_unlimited_input(NULL);
    printf("got: %s\n", buffer);
    printf("> ");
    get_unlimited_input(buffer);
    printf("got: %s\n", buffer);
    return 0;
}

Comments

1

If you get a runtime error from malloc, realloc or free, it means you've corrupted the heap. Common reasons for heap corruption include using a memory block after it has been freed (this includes calling free twice) and buffer overflows (and underflows).

The corruption happened before the runtime error. It may have happened a long time before, so if you just start debugging the program at the point the error happens, it may be difficult to reconstruct what happened. There are other tools that can help you locate the problem more precisely. On Unix-like systems with GCC or Clang, AddressSanitizer is pretty useful.

# Settings for Ubuntu 20.04; you may need to adapt for your system
$ export ASAN_OPTIONS=symbolize=1 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer
$ gcc -O -Wall -Wextra a.c -fsanitize=address,undefined && python3 -c "print('.'*117+'\n'+'.'*12)" | ./a.out
=================================================================
==446177==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c000000236 at pc 0x7f246fc0ea6d bp 0x7ffd4e309380 sp 0x7ffd4e308b28
READ of size 119 at 0x60c000000236 thread T0
    #0 0x7f246fc0ea6c  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x67a6c)
    #1 0x55c04dfb32e7 in get_unlimited_input (.../65891246/a.out+0x12e7)
    #2 0x55c04dfb34b1 in main (.../65891246/a.out+0x14b1)
    #3 0x7f246f06f0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #4 0x55c04dfb320d in _start (.../65891246/a.out+0x120d)

0x60c000000236 is located 0 bytes to the right of 118-byte region [0x60c0000001c0,0x60c000000236)
allocated by thread T0 here:
    #0 0x7f246fcb4ffe in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
    #1 0x55c04dfb345c in get_unlimited_input (.../65891246/a.out+0x145c)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x67a6c) 
Shadow bytes around the buggy address:
  0x0c187fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c187fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c187fff8010: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
  0x0c187fff8020: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c187fff8030: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c187fff8040: 00 00 00 00 00 00[06]fa fa fa fa fa fa fa fa fa
  0x0c187fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c187fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c187fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c187fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c187fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==446177==ABORTING

You. The most important parts are the stack trace, and the indication that buffer overflow happened on a “118-byte region”, which suggests that it's happening at the very end of the first invocation of get_unlimited_input or at the very beginning of the second one. The stack trace gives you the exact code address at which the overflow happens, which you can use to set a breakpoint in a debugger; you'll see that it's near the end of the function. As others have already noted,

    *cursor = '\0';
    buffer = realloc(buffer, cursor - buffer);

is wrong: you aren't leaving room for the '\0' terminator. You need

    *(cursor++) = '\0';
    buffer = realloc(buffer, cursor - buffer);

or

    *cursor = '\0';
    buffer = realloc(buffer, cursor - buffer + 1);

I haven't reviewed for other bugs (it isn't the only one).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.