1

I have a data frame looks like following

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb3333.ccc.de12345.dddd.cccc','ccc2222.ddd.aaa.ho16589.ddd','ddd16893.aaa.de59875','aaa15875.ccc.ddd.ho13532']
df = pd.DataFrame({'page':page,'URL':URL})

I want to create a column which extract numbers after either 'de' or 'ho'. Note the length of numbers might be different and the position of 'de' or 'ho' might be different as well.

My code looks like below:

import re
def extract_number(df,url):
    for url in df:
        if df[url].str.contains('de', na = False) == True:
            match = re.search('de:P(\d+)')
        elif df[url].str.contains('ho', na = False) == True:
            match = re.search('ho:P(\d+)')
        else:
            match = 'not found'
        print(match)

out = extract_number(df, 'URL')

It returns the error 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

Desired output should look like following:

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb.ccc.de12345.dddd.cccc','ccc.ddd.aaa.ho16589.ddd','ddd.aaa.de59875','aaa.ccc.ddd.ho13532']
ID = ['12345','16589','59875','13532']
df = pd.DataFrame({'page':page,'URL':URL,'ID':ID})

Million thanks!!!!

1 Answer 1

2

Use str.extract with positive lookbehind:

df["num"] = df["URL"].str.extract(r"(?<=de|ho)(\d+)")

print (df)

#
  page                                URL    num
0    A  aaa.bbb3333.ccc.de12345.dddd.cccc  12345
1    B        ccc2222.ddd.aaa.ho16589.ddd  16589
2    C               ddd16893.aaa.de59875  59875
3    D           aaa15875.ccc.ddd.ho13532  13532
Sign up to request clarification or add additional context in comments.

6 Comments

You need to delete the numbers from the URL as well
delete the numbers? what do you mean?
the resulting dataframe shoud have URLs without additional numbers exept the id (I GUESS)... So, instead of aaa.bbb3333.ccc.de12345.dddd.cccc, it should be this aaa.bbb.ccc.de12345.dddd.cccc
Yes, you're right... but the result he/she is expecting doesn't have these numbers... anyway, good job :)
So sorry guys. That was a mistake. I forget to type in numbers in URL. Thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.