1

EDIT: This coincides with my interest from the answer here:

Currently, I have been using this but it is obviously problematic if one needs to find str3, str4,....

size_t find(const std::string& line, const std::string& str1, const std::string& str2, int pos) {
    int eol1 = line.find(str1,pos);
    int eol2 = line.find(str2,pos);
    return (eol1 < eol2) ? eol2 : eol1;
}

size_t find(const std::string& line, std::vector<std::string> vect, int pos ) {
    int eol1; 
    eol1 = 0;
    for (std::vector<std::string>::iterator iter = vect.begin(); iter != vect.end(); ++iter){
        //std::cout << *iter << std::endl;
        int eol2 = line.find(*iter, pos);
        if (eol1 == 0 && eol2 > 0)
            eol1 = eol2;
        else if ( eol2 > 0 && eol2 < eol1)
            eol1 = eol2;
    }
    return eol1;
}

Question: Why cannot std::begin() work for static while NOT for dynamic and what is the most simple or efficient alternative(s)?

Curiously, I have used frequently two or three words searching in Fortran routines, but no one compact "Multi-string search" function is populated in c++ communities. Would one has to implement the complicated "grep" family or "regex" if you needs this functionality?

 bool contains(const std::string& input, const std::string keywords[]){//cannot work
    //std::string keywords[] = {"white","black","green"}; // can work
    return std::any_of(std::begin(keywords), std::end(keywords),
        [&](const std::string& str) {return input.find(str) != std::string::npos; });
}

Why the vectorized version cannot work either?

bool contains(const std::string& input, const std::vector<std::string> keywords){
// do not forget to make the array static!
//std::string keywords[] = {"white","black","green"};
return std::any_of(std::begin(keywords), std::end(keywords),
    [&](const std::string& str) {return input.find(str) != std::string::npos; });
}

Appended: on the way of learning "parameter pack", but still something goes wrong ...

//base
size_t fin(const std::string& line, const std::string& str1) {
    std::cout << var1 << std::endl;
    return line.find(str1);
}
//vargin
template <typename... Types>
size_t fin(const std::string& line, const Types... var1) {
    return fin(line, var1...);
}
6
  • When you say "static" and "dynamic" I assume you mean plain arrays (for "static") and pointers (for "dynamic"), is that correct? Then think about what a pointer is pointing to: It only points to a single object and there's no inbuilt knowledge in the language that there might be more data following that single object. And when you (and your program) know that there are more object, how would the system know when they end? Commented Dec 1, 2021 at 10:56
  • And please take some time to refresh the help pages, especially "What topics can I ask about here?" and "What types of questions should I avoid asking?". Also take the tour and read about How to Ask good questions and this question checklist. One question per question, please. Commented Dec 1, 2021 at 10:57
  • const std::string keywords[] isn't static or dynamic - it's just a function argument undergoing array decay to a pointer. Just pass a std::vector, or a std::array, or an array ref if you don't want raw arrays to behave like raw arrays always do. Commented Dec 1, 2021 at 10:59
  • By the way, if by the term "vectorized" you mean that you use std::vector, then that's not the correct term. Vectorization is how to calculate using multiple values in parallel. Commented Dec 1, 2021 at 11:05
  • Also, as an argument const std::string keywords[] is parsed as const std::string* keywords. It's not an array just a pointer. Commented Dec 1, 2021 at 11:06

2 Answers 2

1

Coming at it from a different angle, maybe an array of strings isn't the best container to check against. I would advise to use std::set.

#include <cassert>
#include <iostream>
#include <set>
#include <string>
#include <string_view>

std::set<std::string_view> keywords{ "common", "continue", "data", "dimension" };
std::set<char> delimiters{ ' ', ',' , '.', '!', '?', '\n' };

inline bool is_keyword(const std::string_view& word)
{
    return keywords.find(word) != keywords.end();
}

inline bool is_delimiter(const char c)
{
    return delimiters.find(c) != delimiters.end();
}

bool contains_keyword(const std::string& sentence)
{
    auto word_begin = sentence.begin();
    auto word_end = sentence.begin();

    do
    {
        // create string views over each word 
        // words are found by looking for delimiters
        // string_view is used so no data is copied into temporaries
        while ((word_end != sentence.end()) && !is_delimiter(*word_end)) word_end++;
        std::string_view word{ word_begin,word_end };

        // stop as soon as keyword is found
        if (is_keyword(word)) return true;

        // skip delimiters
        while ((word_end != sentence.end()) && is_delimiter(*word_end)) word_end++;
        word_begin = word_end;

    } while (word_end != sentence.end());

    return false;
}

int main()
{
    std::string sentence_with_keyword{ "this input sentence, has keyword data in it" };
    bool found = contains_keyword(sentence_with_keyword);
    assert(found);

    if (found)
    {
        std::cout << "sentence contains keyword\n";
    }

    std::string sentence_without_keyword{ "this sentence will not contain any keyword!" };
    found = contains_keyword(sentence_without_keyword);
    assert(!found);

    return 0;
}
Sign up to request clarification or add additional context in comments.

2 Comments

Good point, vastly used vector container blocked me from thinking out of <set>, but please note the intention is to find if any of specified words appears in a long sentence.
Ah ok, updated the example. I also moved from strings to string_views to avoid any unecessary copying of data. And added a function that can find keywords in a string. (Could be you need to fine tune the set of delimiters though)
1
  1. Why cannot std::begin() work for static while NOT for dynamic and what is the most simple or efficient alternative(s)?

What the code in the other answer is referring to is a static local variable.

// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};

The keyword static here is a shortcut: it turns keywords into a global variable, but scoped locally. So a clearer way to represent what the author was saying might be this:

// put this here...
std::wstring keywords[] = {L"white",L"black",L"green", ...};
    
bool ContainsMyWordsNathan(const std::wstring& input)
{
    //... instead of here
    return std::any_of(std::begin(keywords), std::end(keywords),
      [&](const std::wstring& str){return input.find(str) != std::string::npos;});
}

The code will work just fine if you use an std::vector or an array inside the function. But there is overhead each time to build the list each time you call it.

When it's defined globally, the keyword list is constructed once and left in memory for the duration of the program.


  1. I have used frequently two or three words searching in Fortran routines, but no one compact "Multi-string search" function is populated in c++ communities. Would one has to implement the complicated "grep" family or "regex" if you needs this functionality?

C++ isn't really a compact one-liner kind of language. The algorithm header is meant to give you a way to express your algorithm in terms that make it clear what it's doing (std::any_of, std::count, std::copy_if, etc).

Your code is searching for one keyword, doing a pass each time. Instead of doing multiple searches over the text, you might consider tokenizing your string first by finding groups of alphanumeric characters. Then search a set or map to see if the word is a keyword, as the other answer suggests.

It's far from a compact one-liner, but here's how I would implement this:

bool is_alpha(const char c) {
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}

bool is_not_alpha(const char c) {
    return !is_alpha(c);
}

std::unordered_set<std::string_view> keywords = { "red", "blue", "yellow" };

bool has_keyword(std::string_view input) {
    auto it = input.begin();
    while (it != input.end()) {
        // find a word
        auto word_start = std::find_if(it, input.end(), is_alpha);
        auto word_end = std::find_if(word_start, input.end(), is_not_alpha);
        std::string_view token { &*word_start, static_cast<size_t>(word_end - word_start) };
        
        // test if it's a keyword
        if (keywords.find(token) != keywords.end())
            return true;

        it = word_end;
    }

    return false;
}

1 Comment

The word "static" means too much in c++ which I have never learned enough.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.