I'm trying to drop values from a dataframe that fuzzy match items in a list.
I have a dataframe (test_df) that looks like:
id email created_at
0 1 son@mail_a.com 2017-01-21 18:19:00
1 2 boy@mail_b.com 2017-01-22 01:19:00
2 3 girl@mail_c.com 2017-01-22 01:19:00
I have a list of a few hundred email domains that I am reading in from a txt file that looks like:
mail_a.com
mail_d.com
mail_e.com
I'm trying to drop from the dataframe any row that contains a matching email domain using:
email_domains = open('file.txt', 'r')
to_drop = email_domains.read().splitlines()
dropped_df = test_df[~test_df['email'].isin(to_drop)]
print(test_df)
So, the result should look like:
id email created_at
0 2 boy@mail_b.com 2017-01-22 01:19:00
1 3 girl@mail_c.com 2017-01-22 01:19:00
But the first row with "son@mail_a.com" is not dropped. Any suggestions?