I have a dataframe with columns with tags assigned to the text. I want to create a tags column, which would contain a list of all possible tags without NaN.
I can remove NaN from a single list, but unsure what is the most efficient way to remove them for all lists in the tags column. My dataframe contains 30,000 rows.
Any help would be greatly appreciated!
import pandas as pd
df = pd.DataFrame(data = {'text': ['Quinbrook acquires planned 350 MW project', 'Australian rooftop solar to shine bright', 'The US installed 5.7 GW of solar in Q2'],
'acquisition': ['acquisition', np.nan, np.nan], 'tender': [np.nan, np.nan, np.nan], 'opinion': [np.nan, 'opinion', np.nan]})
# get names of the tags
tags = list(df.columns)
tags.remove('text')
# Create tags column
df['tags'] = df[tags].values.tolist()
# Remove NaN values from a single list
[x for x in df['tags'][0] if str(x) != 'nan']
# ['acquisition']

