Remove entries form CSV file based on a list in Python

Question

I have a CSV file which has the following content:

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun
...

I also have a list as follows:

to_remove=[Fun,Gravy,...]

I would like an efficient way to delete all lines from the csv file which have any one of the words from the list to_remove.

I know one way to do it is to read each line of the csv file, loop through to_remove to see if any of the words are present in the line and save the line to another file if there was no match.

However, I have a lot of entries in both the csv file and the to_remove list (approx 21000 and 300 respectively). So I want a efficient way of doing it in Python.

I do not have access to clusters so map-reduce based options are not an option.

You could try regular expressions or simply parallelise the code. There's only so much you can do. Huge operations will always be huge one way or another. — Aleksander Lidtke
– Aleksander Lidtke, Commented Jan 25, 2014 at 11:50

alvas · Accepted Answer · 2014-01-25 12:03:40Z

1

toremove = ['Fun','Gravy']
with open('test.in','r') as fin, open('test.out','w') as fout:
    for i in filter(lambda x:not any(i for i in toremove if i in x), fin.readlines()):
        fout.write(i)

with open('test.out') as fout:
    print fout.read()

test.in:

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Fun
Apple,Gravy
Apple,Hot
Bat,Cat
Bat,Dry
Bat,Fun

[out:]

Apple,Bat
Apple,Cat
Apple,Dry
Apple,East
Apple,Hot
Bat,Cat
Bat,Dry

answered Jan 25, 2014 at 12:03

alvas

123k118 gold badges504 silver badges807 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steinar Lima Over a year ago

fin.readlines() will read the entire file into memory. Not exactly what the OP wants.

Collectives™ on Stack Overflow

Remove entries form CSV file based on a list in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related