1

I have data stored as a dataframe using Python Pandas. Among the columns, I have a "Product" column which contains the brand name and model (e.g. Nike Air Jordan, Adidas Gazelle). I want to create a new column that just contains the brand (e.g. Nike, Adidas), which I will later use in groupby to summarize the data. From my research, I believe contains and regex can be used to do this. However, the implementation has not worked. I've also seen different approaches, some using "for i in range" while others do it as a replace in a single line of code.

import pandas as pd
import numpy as np

shoes_df = pd.DataFrame({'Product':['Nike vaporfly', 'Nike Jordans', 'Adidas supernova', 'Asics Kayano', 'Asics GT2010', 'Adidas gazelle', 'Nike air max',
                                  'Nike Lebron'], 'Unit sales':[1500, 1600,
2341, 1345, 4523, 2345, 1634, 3129]})

shoes_df['Brand'] = np.where(shoes_df['Product'].str.contains('Nike.*|Adidas.*').any(), 'Nike|Adidas', np.nan)

print(shoes_df)

Here was my attempt at doing the "for i in range" approach, which did not work either. Here, I got the error "TypeError: 'Series' objects are mutable, thus they cannot be hashed"

shoes_df = pd.DataFrame({'Product':['Nike vaporfly', 'Nike Jordans', 'Adidas supernova', 'Asics Kayano', 'Asics GT2010', 'Adidas gazelle', 'Nike air max',
                                  'Nike Lebron'], 'Unit sales':[1500, 1600, 2341, 1345, 4523,
                                   2345, 1634, 3129]})

for i in shoes_df.iterrows():
    if shoes_df['Product'].str.contains('Nike').any():
        shoes_df.set_value(i, 'Brand', 'Nike')
    elif shoes_df['Product'].str.contains('Adidas').any():
        shoes_df.set_value(i, 'Brand', 'Adidas')
    elif shoes_df['Product'].str.contains('Asics').any():
        shoes_df.set_value(i, 'Brand', 'Asics')
    else:
        shoes_df.set_value(i, 'Brand', np.nan)

3 Answers 3

4

IIUC:

shoes_df['brand'] = shoes_df.Product.str.extract(pat='(Nike|Adidas|Asics)',expand=False)

Output:

            Product  Unit sales   brand
0     Nike vaporfly        1500    Nike
1      Nike Jordans        1600    Nike
2  Adidas supernova        2341  Adidas
3      Asics Kayano        1345   Asics
4      Asics GT2010        4523   Asics
5    Adidas gazelle        2345  Adidas
6      Nike air max        1634    Nike
7       Nike Lebron        3129    Nike
Sign up to request clarification or add additional context in comments.

1 Comment

This did the trick and easy to follow. I had missed using extract. Thank you. Also tested when brand wasn't the first word and it still worked.
4

Option 1 (the hard way)
str.extract

brands = ['Nike', 'Adidas', 'Asics']
df['Brand'] = df.Product.str.extract('({})'.format('|'.join(brands)), expand=True)

df

            Product  Unit sales   Brand
0     Nike vaporfly        1500    Nike
1      Nike Jordans        1600    Nike
2  Adidas supernova        2341  Adidas
3      Asics Kayano        1345   Asics
4      Asics GT2010        4523   Asics
5    Adidas gazelle        2345  Adidas
6      Nike air max        1634    Nike
7       Nike Lebron        3129    Nike

Option 2 (somewhat simpler)
str.split

df['Brand'] = df.Product.str.split().str[0]
df

            Product  Unit sales   Brand
0     Nike vaporfly        1500    Nike
1      Nike Jordans        1600    Nike
2  Adidas supernova        2341  Adidas
3      Asics Kayano        1345   Asics
4      Asics GT2010        4523   Asics
5    Adidas gazelle        2345  Adidas
6      Nike air max        1634    Nike
7       Nike Lebron        3129    Nike

You can extend this a bit to replace anything that isn't in brands with NaN:

df['Brand'] = np.where(df.Brand.isin(brands), df.Brand, np.nan)

3 Comments

Thanks. First one worked across different iterations. Option 2 worked when the brand was the first word, but if the brand came later in the string, it returned another word. Option 1 has worked regardless of where the brand was.
@skibbereen Which is why I provided option 1 before option 2 ;/
@cᴏʟᴅsᴘᴇᴇᴅ - bad dupe, stackoverflow.com/q/47292599/2901002, please find matched or open question.
0

If you can assume that the brand is always the first word, then the solution gives you flexibility to capture brands beyond a known list, so just adding it for interest:

shoes_df['Product'].str.extract('^([^\s]+)\s')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.