I am trying to read in a wordlist of 50.000 words i have and sort out all words with duplicate letters. I have already managed to select a random word, convert it to chars in an array, but how do i search that array for duplicates?
-
3Generally the solution for this would be to sort and then use unique or adjacent_find.Jonathan Mee– Jonathan Mee2018-12-03 13:31:17 +00:00Commented Dec 3, 2018 at 13:31
-
if you just have a raw list you'll probably have to brute force check the whole thing for duplicates, it might be worth your time to read the list in and store it in some ordered format so that duplicates of any words will be next to each other allowing you to find the duplicate of any word without having to check 50000 words every timeRyan– Ryan2018-12-03 13:31:51 +00:00Commented Dec 3, 2018 at 13:31
4 Answers
std::adjacent_find is your friend:
template< class ForwardIt > ForwardIt adjacent_find( ForwardIt first, ForwardIt last );Searches the range [first, last) for two consecutive identical elements.
Return value
an iterator to the first of the first pair of identical elements [...] If no such elements are found, last is returned
First sort the array, then do adjacent_find on it and check if it returns last or not.
Comments
You can also find duplicate words using hashing..
1.first create hash table.
2.one by one traverse words.
3.for every word check if it already exists or not..if it is already present print word otherwise insert it into hash.
you can use unordered_set<string> s for hashing.
void printDup(vector<string> words)
{
unordered_set<string> s;
bool f = false;
for(int i = 1; i<words.size(); i++)
{
if (s.find(words[i]) != s.end())
{
cout << words[i] << endl;
f = true;
}
else
s.insert(words[i]);
}
if(f == false)
cout << "No Duplicate words" << endl;
}
1 Comment
I guess that you have an array of char pointers because of saying "convert it to chars in an array" so code would be like:
#include <iostream>
typedef const char* str;
str array [] = {"Hello", "how", "are", "you"};
bool isDuplicated (str word, str* array, int dimension);
int main() {
int length = sizeof(array) / sizeof (char);
str word = "Hello";
std::cout << "The word " << word << " is duplicated: " << isDuplicated (word, array, length) << std::endl;
std::cin.get();
}
bool isDuplicated(str word, str* array, int dimension) {
bool duplicated = false;
for(int i = 0; i < dimension; i ++) {
if(array[i] == word) {
duplicated = true;
break;
}
}
return duplicated;
}
2 Comments
typedef const char* str or using str = const char*? You can't bind string literals to non-const. (It would be better to avoid aliases like this though because they hide the fact that you're working with pointers; str doesn't own but looks like it does)