1

I have a column of URLs and would like to retrieve the digits after the "/show" but before the next "/" and would like these digits to be in the form of integer

sn    URL
1     https://tvseries.net/show/51/johnny155
2     https://tvseries.net/show/213/kimble2
3     https://tvseries.net/show/46/forceps
4     https://tvseries.net/show/90/tr9
5     https://tvseries.net/show/22/candlenut

expected output is

sn    URL                                          show_id
1     https://tvseries.net/show/51/johnny155       51
2     https://tvseries.net/show/213/kimble2        213
3     https://tvseries.net/show/46/forceps         46 
4     https://tvseries.net/show/90/tr9             90
5     https://tvseries.net/show/22/candlenut       22

Currently, i've tried the following code to retrieve the digits after "show" and it is able to produce a column where the show_id is in brackets (i.e., [51], [213]) and its type is pandas.core.series.Series.

Is there a more efficient way to get the show_id in integer form without the brackets? Appreciate any form of help, thank you

import urllib.parse as urlparse

df['protocol'],df['domain'],df['path'], df['query'], df['fragment'] = zip(*df['URL'].map(urlparse.urlsplit))

df['UID'] = df['path'].str.findall(r'(?<=show)[^,.\d\n]+?(\d+)')

1 Answer 1

2

You can use extract to create a column by using a capture group to match the digits between forward slashes after show:

df = pd.DataFrame({ 'sn' : [1, 2, 3, 4, 5], 
                   'URL': ['https://tvseries.net/show/51/johnny155',
                           'https://tvseries.net/show/213/kimble2',
                           'https://tvseries.net/show/46/forceps',
                           'https://tvseries.net/show/90/tr9',
                           'https://tvseries.net/show/22/candlenut'
                           ]})
df['show_id'] = df['URL'].str.extract('show/(\d+)/')
df

Output

   sn                                     URL show_id
0   1  https://tvseries.net/show/51/johnny155      51
1   2   https://tvseries.net/show/213/kimble2     213
2   3    https://tvseries.net/show/46/forceps      46
3   4        https://tvseries.net/show/90/tr9      90
4   5  https://tvseries.net/show/22/candlenut      22
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.