0

I have a pandas df with the column url. The data looks like this:

row               url
1      'https://www.delish.com/cooking/recipe-ideas/recipes/four-cheese'
2      'https://www.delish.com/holiday-recipes/thanksgiving/thanksgiving-cabbage/
3      'https://www.delish.com/kitchen-tools/cookware-reviews/advice/kitchen-tools-gadgets/'

I only need to grab the values of 2nd index, which is cooking or holiday-recipes, etc.
Desired output:

row               url
1               cooking
2               holiday-recipes
3               kitchen-tools

I wanted to parse urls into different columns and then drop the columns that I don't need. Here is the code:

df['protocol'],df['domain'],df['path']=zip(*df['url'].map(urlparse(df['url']).urlsplit))

The error message is: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Is there a better way to solve the issue? How can I grab the specific index?

2 Answers 2

1

Is this what you're looking for?

df['url'] = df['url'].str.split('/').str[3]
print(df)

   row              url
0    1          cooking
1    2  holiday-recipes
2    3    kitchen-tools
Sign up to request clarification or add additional context in comments.

1 Comment

Precisely! Thank you very much. I have accepted the answer.
1

Another way is to match the the alphas with character - immediately after com

df['url']=df['url'].str.extract('((?<=com\/)[a-z-]+)')



          url
0          cooking
1  holiday-recipes
2    kitchen-tools

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.