Fastest way to check if a item is in a list - Python [duplicate]

Question

I'm having a problem making a vocabulary of words in python. My code goes through every word in a document of about 2.3MB and checks whether or not the word is in the dictionary, if it is not, it appends to the list

The problem is, it is taking way to long (I havent even gotten it to finish yet). How can I solve this?

Code:

words = [("_", "hello"), ("hello", "world"), ("world", "."), (".", "_")] # List of a ton of tuples of words
vocab = []
for w in words:
    if not w in vocab:
        vocab.append(w)

How many words you got there? Any why not use set() instead of list? — Dekel
– Dekel, Commented Dec 27, 2016 at 0:17
can you provide a copy of the words you are checking against. — TheLazyScripter
– TheLazyScripter, Commented Dec 27, 2016 at 0:18

Alex Hall · Accepted Answer · 2016-12-27 00:26:10Z

3

Unless you need vocab to have a particular order, you can just do:

vocab = set(words)

answered Dec 27, 2016 at 0:26

Alex Hall

36.2k5 gold badges63 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

N. Chalifour Over a year ago

but what if a word appears more than once is the words list. I dont want any duplicates in my vocabulary. @AlexHall

Alex Hall Over a year ago

@N.Chalifour yup, sets don't have duplicates.

N. Chalifour Over a year ago

thanks! it worked like a charm.

ettanany · Accepted Answer · 2016-12-27 00:32:56Z

2

The following is a test to compare the execution time of for loop and set():

import random
import time
import string


words = [''.join(random.sample(string.letters, 5)) for i in range(1000)]*10  # *10 to make duplicates!

vocab1 = []

t1 = time.time()
for w in words:
    if w not in vocab1:
        vocab1.append(w)
t2 = time.time()

t3 = time.time()
vocab2 = set(words)
t4 = time.time()

print t2 - t1
print t4 - t3

Output:

0.0880000591278  # Using for loop
0.000999927520752  # Using set()

answered Dec 27, 2016 at 0:32

ettanany

20k9 gold badges49 silver badges64 bronze badges

Collectives™ on Stack Overflow

Fastest way to check if a item is in a list - Python [duplicate]

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related