I have a data frame column name "New" below
df = pd.DataFrame({'New' : ['emerald shines bright(happy)(ABCED ID - 1234556)', 'honey in the bread(ABCED ID - 123467890)','http/ABCED/id/234555', 'healing strenght(AxYBD ID -1234556)', 'this is just a text'],
'UI': ['AOT', 'BOT', 'LOV', 'HAP', 'NON']})
Now I want to extract the various IDs for example ABCED', AxYBD, and id in the 'http' into another column.
But when I used
df['New_col'] = df['New'].str.extract(r'.*\((.*)\).*',expand=True)
I can't get it to work well as the whole parenthesis for instance (ABCED ID - 1234556) is returned. More so, the http id 234555 is not returned.
Also, can someone clean the first column to removed the ID in paranthesis and have something like,
New UI New_col
0 emerald shines bright(happy) AOT 1234556
1 honey in the bread BOT 123467890
2 http/ABCED/id/234555 LOV 234555
3 healing strenght HAP 1234556
4 this is just a text NON


extract(r'.*\((.*)\).*',expand=True)df['New_col'] = df['New'].str.extract(r'.*(?:\(\D*|http\S*/id/)(\d+)',expand=False)