How to check if a string exists in an array of character arrays

Question

I'm looking for a way to check if a specific string exists in a large array of strings. The array is multi-dimensional: all_strings[strings][chars];. So essentially, this array is an array of character arrays. Each character array ends in '\0'

Given another array of characters, I need to check to see if those characters are already in all_strings, kind of similar to the python in keyword.

I'm not really sure how to go about this at all, I know that strcmp might help but I'm not sure how I could implement it.

Jjust loop through your all_strings list, string by string and do strcmp against the string (other array of characters) you want to compare. — lurker
– lurker, Commented Apr 6, 2015 at 1:17

autistic · Accepted Answer · 2015-04-06 05:19:13Z

As lurker suggested, the naive method is to simply loop on the array of strings calling strcmp. His string_in function is unfortunately broken due to a misunderstanding regarding sizeof(string_list), and should probably look like this:

#include <string.h>
int string_in(char *needle, char **haystack, size_t haystack_size) {
    for (size_t x = 0; x < haystack_size; x++) {
         if (strcmp(needle, haystack[x]) == 0) {
             return 1;
         }
    }
    return 0;
}

This is fairly inefficient, however. It'll do if you're only going to use it once in a while, particularly on a small collection of strings, but if you're looking for an efficient way to perform the search again and again, changing the search query for each search, the two options I would consider are:

If all_strings is relatively static, you could sort your array like so: qsort(all_strings, strings, chars, strcmp);... Then when you want to determine whether a word is present, you can use bsearch to execute a binary search like so: char *result = bsearch(search_query, all_strings, strings, chars, strcmp);. Note that when all_strings changes, you'll need to sort it again.
If all_strings changes too often, you'll probably benefit from using some other data structure such as a trie or a hash table.

lurker · Accepted Answer · 2015-04-06 11:39:36Z

3

Use a for loop. C doesn't have a built-in like Python's in:

int i;

for ( i = 0; i < NUM_STRINGS; i++ )
    if ( strcmp(all_strings[i], my_other_string) == 0 )
        break;

// Here, i is the index of the matched string in all_strings.
//   If i == NUM_STRINGS, then the string wasn't found

If you want it to act like Python's in, you could make it a function:

// Assumes C99
#include <string.h>
#include <stdbool.h>

bool string_in(char *my_str, char *string_list[], size_t num_strings)
{
    for ( int i = 0; i < num_strings; i++ )
        if (strcmp(my_str, string_list[i]) == 0 )
            return true;

    return false;
}

edited Apr 6, 2015 at 11:39

answered Apr 6, 2015 at 1:22

lurker

58.5k9 gold badges74 silver badges108 bronze badges

4 Comments

user4098326 Over a year ago

In fact, C++11 has range loop, which is similar to Python's in.

lurker Over a year ago

@user4098326 indeed. But the question was restricted to C.

autistic Over a year ago

Your string_in function is wrong... Consider the value of sizeof string_list.

lurker Over a year ago

@undefinedbehaviour thanks for pointing that out. That's what I get for answering SO questions while I'm laid up with a cold and fever. I corrected it since I can't delete an accepted answer.

Community · Accepted Answer · 2017-05-23 12:15:03Z

1

You could simply check if a string exists in an array of strings. A better solution might be to actually return the string:

/*
 * haystack: The array of strings to search.
 * needle: The string to find.
 * max: The number of strings to search in "haystack".
 */
char *
string_find(char **haystack, char *needle, size_t max)
{
    char **end = haystack + max;
    for (; haystack != end; ++haystack)
        if (strcmp(*haystack, needle) == 0)
            return *haystack;
    return NULL;
}

If you're wanting the behavior of a set, where all strings in the array are unique, then you can use it that way:

typedef struct set_strings {
    char **s_arr;
    size_t count;
    size_t max;
} StringSet;
.
.
.
int
StringSet_add(StringSet *set, const char *str)
{
    // If string exists already, the add operation is "successful".
    if (string_find(set->s_arr, str, set->count))
        return 1;

    // Add string to set and return success if possible.
    /*
     * Code to add string to StringSet would go here.
     */
    return 1;
}

If you want to actually do something with the string, you can use it that way too:

/*
 * Reverse the characters of a string.
 *
 * str: The string to reverse.
 * n: The number of characters to reverse.
 */
void
reverse_str(char *str, size_t n)
{
    char c;
    char *end;

    for (end = str + n; str < --end; ++str) {
        c = *str;
        *str = *end;
        *end = c;
    }
}
.
.
.
    char *found = string_find(words, word, word_count);
    if (found)
        reverse_str(found, strlen(found));

As a general-purpose algorithm, this is reasonably useful and even can be applied to other data types as necessary (some re-working would be required of course). As pointed out by undefined behaviour's answer, it won't be fast on large amounts of strings, but it is good enough for something simple and small.

If you need something faster, the recommendations given in that answer are good. If you're coding something yourself, and you're able to keep things sorted, it's a great idea to do that. This allows you to use a much better search algorithm than a linear search. The standard bsearch is great, but if you want something suitable for fast insertion, you'd probably want a search routine that would provide you with the position to insert a new item to avoid searching for the position after bsearch returns NULL. In other words, why search twice when you can search once and accomplish the same thing?

edited May 23, 2017 at 12:15

CommunityBot

11 silver badge

answered Apr 6, 2015 at 3:24

user539810

4 Comments

autistic Over a year ago

I like that you've adapted my code to return the string from the table. That's a good step in the direction of removing strings from the collection. I also like that the interface you've introduced makes it simple to change StringSet_add to use some other collection. Do you commonly develop associative array data structures and algorithms? I'd be interested in following your github profile, if you have one.

user539810 Over a year ago

@undefinedbehaviour I don't have a github profile honestly. I also don't develop such structures and associated algorithms commonly, but I've found that when I do, a common pattern in my code is similar to what I posted above.

autistic Over a year ago

I find that when fast insertion (or removal, or any other variation for that matter) is required, that's when other data structures are appropriate. I tend to be biased towards PATRICIA, because as you mentioned you can return the position to insert at (though the process of inserting requires more information and is hence more complex than just plonking it there). I am curious, though... What's with the reverse_str? What happens if you come across a palindrome?

user539810 Over a year ago

@undefinedbehaviour Regarding your curiosity, I'm not sure what you're referring to, except perhaps the fact that I used str != --end instead of str < --end, which would have resulted in terrible things when specifying an even number of bytes to swap. That's fixed now though. If you're asking about efficiency (why didn't I add an if (*str == *end) condition inside the loop?), then it's because the unconditional swap is faster on my machine and likely other machines as well (branches are "slow" compared with some other instructions, especially data movement instructions).

Collectives™ on Stack Overflow

How to check if a string exists in an array of character arrays

3 Answers 3

Comments

4 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related