I have a data frame looks like following
import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb3333.ccc.de12345.dddd.cccc','ccc2222.ddd.aaa.ho16589.ddd','ddd16893.aaa.de59875','aaa15875.ccc.ddd.ho13532']
df = pd.DataFrame({'page':page,'URL':URL})
I want to create a column which extract numbers after either 'de' or 'ho'. Note the length of numbers might be different and the position of 'de' or 'ho' might be different as well.
My code looks like below:
import re
def extract_number(df,url):
for url in df:
if df[url].str.contains('de', na = False) == True:
match = re.search('de:P(\d+)')
elif df[url].str.contains('ho', na = False) == True:
match = re.search('ho:P(\d+)')
else:
match = 'not found'
print(match)
out = extract_number(df, 'URL')
It returns the error 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'
Desired output should look like following:
import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb.ccc.de12345.dddd.cccc','ccc.ddd.aaa.ho16589.ddd','ddd.aaa.de59875','aaa.ccc.ddd.ho13532']
ID = ['12345','16589','59875','13532']
df = pd.DataFrame({'page':page,'URL':URL,'ID':ID})
Million thanks!!!!