So, I have a df similar to the one below in Pandas:
Name URL
X http://www.x.com/abc/xyz/url.html
X http://www.x.com/yyz/hue/end.html
Othername http://website.othername.com/abc.html
Othername http://home.othername.com/someword/word.html
Example http://www.example.com/999/something/index.html
I wanted to, using regex (I guess) add an "Extract" column, as below:
Name URL Extract
X http://www.x.com/abc/xyz/url.html abc
X http://www.x.com/yyz/hue/end.html yyz
Othername http://website.othername.com/abc.html website
Othername http://home.othername.com/someword/word.html home
Example http://www.example.com/999/something/index.html 999
As you may see, the parts I want to extract vary according to the website. So, for the value 'X' under 'Name', I'd have to apply one regex pattern. For 'Othername', another pattern.
I have 6 different (and 6 different patterns) for this.
I tried using 'where', but I could make it work only for one of the websites, not considering multiple conditions. As follows:
df['Extract'] = np.where(df['Name'] == 'X', df.URL.str.extract(r'www\.x\.com\/(.*?)/'),'')
I also tried creating a function for this:
def ext(c):
if c['Name'] == 'X':
c.URL.str.extract(r'www\.x\.com\/(.*?)/')
elif c['Name'] == 'Example':
c.URL.str.extract(r'www\.example\.com\/(.*?)/')
(...)
else:
return ''
df['Extract'] = df.apply(ext)
df
How can I make this work for the different str I have under 'Name'?