I have this:
Title
Num
0 <span class="o-label--tiny">VALEUR ÉNERGÉTIQUE</span>
1 <span class="o-label--tiny">PROTÉINES</span>
2 <span class="o-label--tiny">GLUCIDES</span>
<class 'pandas.core.frame.DataFrame'> Num Index(['Title'], dtype='object')
This is what I want:
Title
Num
0 VALEUR ÉNERGÉTIQUE
1 PROTÉINES
2 GLUCIDES
This is the regex I developed:
(<span class=\"o-label--tiny\">)([a-zA-Z]+\s*\w*)(</span>)
Testing it I see it matches the whole initial string and has groups for the different substrings. In the end, I want group(2) in my dataframe column. (My examples below show the explicit regex but I have also tried these with the re.compile result which doesnt work either to get me to the my final result).
This is what I have tried:
df['Title'] = df['Title'].replace({'<span class=\"o-label--tiny\">': ''}, inplace=True, regex=True)
The result:
Title
Num
0 None
1 None
2 None
Try number 2:
df['Title'] = df['Title'].str.replace('<span class=\"o-label--tiny\">', repl = '')
Result number 2:
Title
Num
0 NaN
1 NaN
2 NaN
Try number 3:
df['Title'] = df[lambda df: df.columns[0]].str.extract('(>[a-zA-Z]+\s*\w*)', expand=False)
Result 3:
Title
Num
0 NaN
1 NaN
2 NaN
I really dont see what I am doing wrong and any help getting to my desired result would be appreciated. Thank you!