0

I am trying to read in a wordlist of 50.000 words i have and sort out all words with duplicate letters. I have already managed to select a random word, convert it to chars in an array, but how do i search that array for duplicates?

2
  • 3
    Generally the solution for this would be to sort and then use unique or adjacent_find. Commented Dec 3, 2018 at 13:31
  • if you just have a raw list you'll probably have to brute force check the whole thing for duplicates, it might be worth your time to read the list in and store it in some ordered format so that duplicates of any words will be next to each other allowing you to find the duplicate of any word without having to check 50000 words every time Commented Dec 3, 2018 at 13:31

4 Answers 4

3

std::adjacent_find is your friend:

template< class ForwardIt >
ForwardIt adjacent_find( ForwardIt first, ForwardIt last );

Searches the range [first, last) for two consecutive identical elements.

Return value

an iterator to the first of the first pair of identical elements [...] If no such elements are found, last is returned

First sort the array, then do adjacent_find on it and check if it returns last or not.

Sign up to request clarification or add additional context in comments.

Comments

0

You can also find duplicate words using hashing..

  • 1.first create hash table.

  • 2.one by one traverse words.

  • 3.for every word check if it already exists or not..if it is already present print word otherwise insert it into hash.

you can use unordered_set<string> s for hashing.

void printDup(vector<string> words) 
{ 
    unordered_set<string> s; 
    bool f = false; 
    for(int i = 1; i<words.size(); i++) 
    { 
        if (s.find(words[i]) != s.end()) 
        { 
            cout << words[i] << endl; 
            f = true; 
        } 
        else
            s.insert(words[i]); 
    } 
    if(f == false) 
        cout << "No Duplicate words" << endl; 
}

1 Comment

You can insert unconditionally, and then check if the insertion is successful. It should be faster.
0

You asked: select a random word, convert it to chars in an array, but how do i search that array for duplicates?

Using boost, this may be boiled down to:

const bool hasDuplicates = boost::size(letters) != boost::size(boost::unique(letters));

Comments

0

I guess that you have an array of char pointers because of saying "convert it to chars in an array" so code would be like:

#include <iostream>

typedef const char* str;
str array [] = {"Hello", "how", "are", "you"};

bool isDuplicated (str word, str* array, int dimension);

int main() {

    int length = sizeof(array) / sizeof (char);
    str word = "Hello";
    std::cout << "The word " << word << " is duplicated: " << isDuplicated (word, array, length) << std::endl;
    std::cin.get();
}

bool isDuplicated(str word, str* array, int dimension) {
    bool duplicated = false;
    for(int i = 0; i < dimension; i ++) {
        if(array[i] == word) {
            duplicated = true;
            break;
        }
    }
    return duplicated;
}

2 Comments

Don't you mean typedef const char* str or using str = const char*? You can't bind string literals to non-const. (It would be better to avoid aliases like this though because they hide the fact that you're working with pointers; str doesn't own but looks like it does)
I see, I'm going to change it, and yes, I think it's true but it may work in a simple algorithm, thanks for reminding me :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.