I have a df with variable named url. Each url string in url has a unique six character alphanumeric ID in the URL string. Ive been trying to extract a specific part of each string, the article_id from all urls, and then add it to the df as a new variable.
For example, xwpd7w is the article_id for https://www.vice.com/en_us/article/xwpd7w/how-a-brooklyn-gang-may-have-gotten-crazy-rich-dealing-for-el-chapo
How do I extract article_ids from all urls in the df based on their position next to /article/? Using any method, regex or not?
I have so far done the following:
df.url.str.split()
ex output: [https://www.vice.com/en_au/article/j539yy/smo...
df['cutcurls'] = df.url.str.join(sep=' ')
ex output: h t t p s : / / w w w . v i c e . c o m / e n
Any ideas?