3

I want to create a function in C that gets the a substring from a string. This is what I have so far:

char* substr(char* src, int start, int len){
    char* sub = malloc(sizeof(char)*(len+1));
    memcpy(sub, &src[start], len);
    sub[len] = '\0';
    return sub;
}

int main(){
    char* test = malloc(sizeof(char)*5); // the reason I don't use char* = "test"; is because I wouldn't be able to use free() on it then
    strcpy(test, "test");
    char* sub = substr(test, 1, 2); // save the substr in a new char*
    free(test); // just wanted the substr from test
    printf("%s\n", sub); // prints "es"

    // ... free when done with sub
    free(sub);
}

Is there any way I can save the substring into test without having to create a new char*? If I do test = substr(test, 1, 2), the old value of test no longer has a pointer pointing to it, so it's leaked memory (I think. I'm a noob when it comes to C languages.)

7
  • I don't think so, you'd need to have \0 at the end, and that's only appropriate if the substring takes all the length. I think it should be perfectly possible to have that kind of substring (const char* substr(const char* src, int start).), and this detail should be documented (because the initial string must not be changed anymore or the substring will just change too inexplicably, so to speak). Commented Apr 3, 2015 at 20:25
  • 1
    You don't have to allocate new memory. Just use memmove. Commented Apr 3, 2015 at 20:27
  • 1
    And to test for memory leaks, simply use valgrind. Simple to use. just run your program with valgrind ./path/to/prog. Compile with -g and valgrind will identify lines with problems for you. Commented Apr 3, 2015 at 20:39
  • @Jongware So if I use memmove and try memmove(test, &test[1], 2); test = realloc(test, sizeof(char)*2); //to reduce the size to an appropriate length, esst gets printed. I assume this means that 5 bytes are still allocated for for test instead of 3 (the 2 chars and the null terminator). Commented Apr 3, 2015 at 20:39
  • If you removed the terminating Zero code as well, yes. So why would you do that? Commented Apr 3, 2015 at 20:43

5 Answers 5

1
void substr(char* str, char* sub , int start, int len){
    memcpy(sub, &str[start], len);
    sub[len] = '\0';
}

int main(void)
{
    char *test = (char*)malloc(sizeof(char)*5);
    char *sub = (char*)malloc(sizeof(char)*3);
    strcpy(test, "test");
    substr(test, sub, 1, 2);

    printf("%s\n", sub); // prints "es"
    free(test);
    free(sub);

    return 0;
}
Sign up to request clarification or add additional context in comments.

3 Comments

Wouldn't this lead to a memory leak? Although test is pointing to a new value on the line test = substr(test, 1, 2); , the old value for test has nothing pointing to it now and I have no way of freeing it.
@BillLynch hope this would help... u dont need to create a new string in substr function...just memcpy it in subtr() and if you want substring in same string just pass substr(test, test, 1, 2)
@MohsinLatif: Your edit is what I would personally prefer the interface to look like. And now it doesn't leak either :)
0

Well you could always keep the address of the malloc'd memory is a separate pointer:

char* test = malloc(~~~)
char* toFree = test;
test = substr(test,1,2);
free(toFree);

But most of the features and capabilities of shuffling this sort of data around has already been done in string.h. One of those functions probably does the job you want get done. movemem() as others have pointed out, could move the substring to the start of your char pointer, viola!

If you specifically want to make a new dynamic string to play with while keeping the original separate and safe, and also want to be able to overlap these pointers.... that's tricky. You could probably do it if you passed in the source and destination and then range-checked the affected memory, and free'd the source if there was overlap... but that seems a little over-complicated.

I'm also loathe to malloc memory that I trust higher levels to free, but that's probably just me.

As an aside,

char* test = "test";

Is one of those niche cases in C. When you initialize a pointer to a string literal (stuff in quotes), it puts the data in a special section of memory just for text data. You can (rarely) edit it, but you shouldn't, and it can't grow.

5 Comments

You can not change the contents of a string literal!
It's implementation dependent! But yeah, I was probably thinking of how code-composer for the MSP430 handles it. Typically it's read-only. Good point. ADDITIONAL YELLING
It's undefined behavior, not implementation defined behavior. C 2011 Section 6.4.5 Paragraph 7: It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
@BillLynch dude, are you kidding me? If some behaviour is not defined by a standard (why are you assuming C11 anyway?) but the compiler and environment allow it to happen anyway, what do you think the end results depends upon?
The same sentence can be found in C99 Section 6.4.5 Paragraph 6 if you'd rather. I don't have a copy of C90 lying around but I imagine the same line is there as well.
0

There are a number of ways to do this, and the way you approached it is a good one, but there are several areas where you seemed a bit confused. First, there is no need to allocated test. Simply using a pointer is fine. You could simply do char *test = "test"; in your example. No need to free it then either.

Next, when you are beginning to allocate memory dynamically, you need to always check the return to make sure your allocation succeeded. Otherwise, you can easily segfault if you attempt to write to a memory location when there has been no memory allocated.

In your substr, you should also validate the range of start and len you send to the function to insure you are not attempting to read past the end of the string.

When dealing with only positive numbers, it is better to use type size_t or unsigned. There will never be a negative start or len in your code, so size_t fits the purpose nicely.

Lastly, it is good practice to always check that a pointer to a memory block to be freed actually holds a valid address to prevent freeing a block of memory twice, etc... (e.g. if (sub) free (sub);)

Take a look at the following and let me know if you have questions. I changed the code to accept command line arguments from string, start and len, so the use is:

./progname the_string_to_get_sub_from start len

I hope the following helps.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* substr (char* src, size_t start, size_t len)
{
    /* validate indexes */
    if (start + len > strlen (src)) {
        fprintf (stderr, "%s() error: invalid substring index (start+len > length).\n", __func__);
        return NULL;
    }

    char* sub = calloc (1, len + 1);

    /* validate allocation */
    if (!sub) {
        fprintf (stderr, "%s() error: memory allocation failed.\n", __func__);
        return NULL;
    }

    memcpy (sub, src + start, len);
    // sub[len] = '\0';             /* by using calloc, sub is filled with 0 (null) */

    return sub;
}

int main (int argc, char **argv) {

    if (argc < 4 ) {
        fprintf (stderr, "error: insufficient input, usage: %s string ss_start ss_length\n", argv[0]);
        return 1;
    }

    char* test = argv[1];           /* no need to allocate test, a pointer is fine  */

    size_t ss_start  = (size_t)atoi (argv[2]);      /* convert start & length from  */
    size_t ss_lenght = (size_t)atoi (argv[3]);      /* the command line arguments   */

    char* sub = substr (test, ss_start, ss_lenght);

    if (sub)                                        /* validate sub before use  */
        printf("\n sub: %s\n\n", sub);

    if (sub)                                        /* validate sub before free */
        free(sub);

    return 0;
}

Output

$ ./bin/str_substr test 1 2

 sub: es

If you choose an invalid start / len combination:

$ ./bin/str_substr test 1 4
substr() error: invalid substring index (start+len > length).

Verify All Memory Freed

$ valgrind ./bin/str_substr test 1 2
==13515== Memcheck, a memory error detector
==13515== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==13515== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==13515== Command: ./bin/str_substr test 1 2
==13515==

 sub: es

==13515==
==13515== HEAP SUMMARY:
==13515==     in use at exit: 0 bytes in 0 blocks
==13515==   total heap usage: 1 allocs, 1 frees, 4 bytes allocated
==13515==
==13515== All heap blocks were freed -- no leaks are possible
==13515==
==13515== For counts of detected and suppressed errors, rerun with: -v
==13515== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

4 Comments

in char* sub = calloc (1, sizeof *sub + len + 1);, sizeof *sub + len + 1 seems like it should be len + 1. BTW: Could return memcpy (sub, src + start, len);.
Correct you are on both points. I left the return sub; rather than the return memcpy.... just because it adds another level of understanding that may be bewildering to those just learning.
the first argument to calloc is the number of elements, in this case len+1. The second argument is the size of the elements , in this case char. I think it should be: char *sub = (char *)calloc(len + 1, sizeof(char));
@DerekJohnson - sizeof(char) is defined as 1 -- so it can be omitted.... (a good compiler will optimize it out anyway)
0

Let's break down what is being talked about:

  1. You allocate some memory, and you create the variable test to point to it.
  2. You allocate some more memory, and you'd like to store that pointer in the variable named test as well.

You have 2 pieces of information that you claim you'd like to store in the same pointer. You can't do this!

Solution 1

Use two variables. I don't know why this isn't acceptable...

char *input = "hello";
char *output = substr(input, 2, 3);

Solution 2

Have your input parameter not be heap memory. There's a number of ways we could do this:

// Use a string literal
char *test = substr("test", 2, 2);

// Use a stack allocated string
char s[] = "test";
char *test = substr(s, 2, 2);

Personally...

If you're already passing the length of the substring to the function, I'd personally rather see that function just get passed the piece of memory that it will push the data into. Something like:

char *substr(char *dst, char *src, size_t offset, size_t length) {
    memcpy(dst, src + offset, length);
    dst[length] = '\0';
    return dst;
}

int main() {
    char s[5] = "test";
    char d[3] = "";

    substr(d, s, 2, 2);
}

2 Comments

I'd expect strcpy(dst, src + offset, length); to take 2 arguments, not 3. Were you thinking memcpy()?
@chux: I was thinking of something that would do more work for me :)
0

In C, string functions quickly run into memory management. So somehow the space for the sub-string needs to exist and passed to the function or the function can allocate it.

const char source[] = "Test";
size_t start, length;

char sub1[sizeof source];
substring1(source, sub1, start, length);
// or
char *sub2 = substring2(source, start, length);
...
free(sub2);

Code needs to specify what happens when 1) the start index is greater than other original string's length and 2) the length similarly exceeds the original string. These are 2 important steps not done OP's code.

void substring1(const char *source, char *dest, size_t start, size_t length) {
  size_t source_len = strlen(source);
  if (start > source_len) start = source_len;
  if (start + length > source_len) length = source_len - start;
  memmove(dest, &source[start], length);
  dest[length] = 0;
}

char *substring2(const char *source, size_t start, size_t length) {
  size_t source_len = strlen(source);
  if (start > source_len) start = source_len;
  if (start + length > source_len) length = source_len - start;
  char *dest = malloc(length + 1);
  if (dest == NULL) {
    return NULL;
  }
  memcpy(dest, &source[start], length);
  dest[length] = 0;
  return dest;
}

By using memmove() vs. memcpy() in substring1(), code could use the same destination buffer as the source buffer. memmove() is well defined, even if buffers overlap.

substring1(source, source, start, length);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.