3

I'm looking for a way to check if a specific string exists in a large array of strings. The array is multi-dimensional: all_strings[strings][chars];. So essentially, this array is an array of character arrays. Each character array ends in '\0'

Given another array of characters, I need to check to see if those characters are already in all_strings, kind of similar to the python in keyword.

I'm not really sure how to go about this at all, I know that strcmp might help but I'm not sure how I could implement it.

1
  • 2
    Jjust loop through your all_strings list, string by string and do strcmp against the string (other array of characters) you want to compare. Commented Apr 6, 2015 at 1:17

3 Answers 3

5

As lurker suggested, the naive method is to simply loop on the array of strings calling strcmp. His string_in function is unfortunately broken due to a misunderstanding regarding sizeof(string_list), and should probably look like this:

#include <string.h>
int string_in(char *needle, char **haystack, size_t haystack_size) {
    for (size_t x = 0; x < haystack_size; x++) {
         if (strcmp(needle, haystack[x]) == 0) {
             return 1;
         }
    }
    return 0;
}

This is fairly inefficient, however. It'll do if you're only going to use it once in a while, particularly on a small collection of strings, but if you're looking for an efficient way to perform the search again and again, changing the search query for each search, the two options I would consider are:

  • If all_strings is relatively static, you could sort your array like so: qsort(all_strings, strings, chars, strcmp);... Then when you want to determine whether a word is present, you can use bsearch to execute a binary search like so: char *result = bsearch(search_query, all_strings, strings, chars, strcmp);. Note that when all_strings changes, you'll need to sort it again.
  • If all_strings changes too often, you'll probably benefit from using some other data structure such as a trie or a hash table.
Sign up to request clarification or add additional context in comments.

Comments

3

Use a for loop. C doesn't have a built-in like Python's in:

int i;

for ( i = 0; i < NUM_STRINGS; i++ )
    if ( strcmp(all_strings[i], my_other_string) == 0 )
        break;

// Here, i is the index of the matched string in all_strings.
//   If i == NUM_STRINGS, then the string wasn't found

If you want it to act like Python's in, you could make it a function:

// Assumes C99
#include <string.h>
#include <stdbool.h>

bool string_in(char *my_str, char *string_list[], size_t num_strings)
{
    for ( int i = 0; i < num_strings; i++ )
        if (strcmp(my_str, string_list[i]) == 0 )
            return true;

    return false;
}

4 Comments

In fact, C++11 has range loop, which is similar to Python's in.
@user4098326 indeed. But the question was restricted to C.
Your string_in function is wrong... Consider the value of sizeof string_list.
@undefinedbehaviour thanks for pointing that out. That's what I get for answering SO questions while I'm laid up with a cold and fever. I corrected it since I can't delete an accepted answer.
1

You could simply check if a string exists in an array of strings. A better solution might be to actually return the string:

/*
 * haystack: The array of strings to search.
 * needle: The string to find.
 * max: The number of strings to search in "haystack".
 */
char *
string_find(char **haystack, char *needle, size_t max)
{
    char **end = haystack + max;
    for (; haystack != end; ++haystack)
        if (strcmp(*haystack, needle) == 0)
            return *haystack;
    return NULL;
}

If you're wanting the behavior of a set, where all strings in the array are unique, then you can use it that way:

typedef struct set_strings {
    char **s_arr;
    size_t count;
    size_t max;
} StringSet;
.
.
.
int
StringSet_add(StringSet *set, const char *str)
{
    // If string exists already, the add operation is "successful".
    if (string_find(set->s_arr, str, set->count))
        return 1;

    // Add string to set and return success if possible.
    /*
     * Code to add string to StringSet would go here.
     */
    return 1;
}

If you want to actually do something with the string, you can use it that way too:

/*
 * Reverse the characters of a string.
 *
 * str: The string to reverse.
 * n: The number of characters to reverse.
 */
void
reverse_str(char *str, size_t n)
{
    char c;
    char *end;

    for (end = str + n; str < --end; ++str) {
        c = *str;
        *str = *end;
        *end = c;
    }
}
.
.
.
    char *found = string_find(words, word, word_count);
    if (found)
        reverse_str(found, strlen(found));

As a general-purpose algorithm, this is reasonably useful and even can be applied to other data types as necessary (some re-working would be required of course). As pointed out by undefined behaviour's answer, it won't be fast on large amounts of strings, but it is good enough for something simple and small.

If you need something faster, the recommendations given in that answer are good. If you're coding something yourself, and you're able to keep things sorted, it's a great idea to do that. This allows you to use a much better search algorithm than a linear search. The standard bsearch is great, but if you want something suitable for fast insertion, you'd probably want a search routine that would provide you with the position to insert a new item to avoid searching for the position after bsearch returns NULL. In other words, why search twice when you can search once and accomplish the same thing?

4 Comments

I like that you've adapted my code to return the string from the table. That's a good step in the direction of removing strings from the collection. I also like that the interface you've introduced makes it simple to change StringSet_add to use some other collection. Do you commonly develop associative array data structures and algorithms? I'd be interested in following your github profile, if you have one.
@undefinedbehaviour I don't have a github profile honestly. I also don't develop such structures and associated algorithms commonly, but I've found that when I do, a common pattern in my code is similar to what I posted above.
I find that when fast insertion (or removal, or any other variation for that matter) is required, that's when other data structures are appropriate. I tend to be biased towards PATRICIA, because as you mentioned you can return the position to insert at (though the process of inserting requires more information and is hence more complex than just plonking it there). I am curious, though... What's with the reverse_str? What happens if you come across a palindrome?
@undefinedbehaviour Regarding your curiosity, I'm not sure what you're referring to, except perhaps the fact that I used str != --end instead of str < --end, which would have resulted in terrible things when specifying an even number of bytes to swap. That's fixed now though. If you're asking about efficiency (why didn't I add an if (*str == *end) condition inside the loop?), then it's because the unconditional swap is faster on my machine and likely other machines as well (branches are "slow" compared with some other instructions, especially data movement instructions).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.