I have a dataframe of multiple movies containing synopsis.
Title Synopsis
Movie1 Old Macdonald had a farm [Written by ABC rewrite]
Movie2 Wheels on the bus (Source: Melon)
Movie3 Tayo the bus [Produced by Wills Garage]
Movie4 James and Giant Apple (Source: Kismet)
I'd like to remove the trailing words that are not required for NLP such that I get a dataframe below
Title Synopsis
Movie1 Old Macdonald had a farm
Movie2 Wheels on the bus
Movie3 Tayo the bus
Movie4 James and Giant Apple
I've tried the following code but my synopsis column ends up with some string like "0"Iodfosomhgooad,somh...\n1GaBauadFal..." Was wondering if how i could resolve this, appreciate any form of help, thank you.
removelist = [('[Written by]', '') ,('(Source:)', '')]
for old, new in removelist:
df['Synopsis'] = re.sub(old, new, str(df['Synopsis']))