I have a data frame that looks like the following:
I want to filter out all words within a list. eg. ['King', 'sEAttle', 'California']. Here is my code
import pandas as pd
import re
remove_words = ['King', 'sEAttle', 'California']
remove_words_lower = (map(lambda x: x.lower(), remove_words))
pattern = '|'.join(remove_words_lower)
t1 = 'Hello! @kingcounty Seattle, #California'
t2 = 'hello! seattlecity #king'
df = pd.DataFrame({'Id': ['user1', 'user2'], 'tweets': [t1, t2]})
clean_tweets = []
for i, tweet in enumerate(df.tweets):
tweet = tweet.lower()
clean_tweet = re.sub(pattern, "", tweet)
clean_tweets.append(clean_tweet)
df['clean_tweets'] = clean_tweets
df
Here is the result:
Is there a way I can modify the RE to remove @county city, and #? In other words, remove the whole word if the word contains a word from a given list. The RE pattern has to be as generic as possible. (ie. can't hard code @county to have it removed)
Expected output:


