I have data stored as a dataframe using Python Pandas. Among the columns, I have a "Product" column which contains the brand name and model (e.g. Nike Air Jordan, Adidas Gazelle). I want to create a new column that just contains the brand (e.g. Nike, Adidas), which I will later use in groupby to summarize the data. From my research, I believe contains and regex can be used to do this. However, the implementation has not worked. I've also seen different approaches, some using "for i in range" while others do it as a replace in a single line of code.
import pandas as pd
import numpy as np
shoes_df = pd.DataFrame({'Product':['Nike vaporfly', 'Nike Jordans', 'Adidas supernova', 'Asics Kayano', 'Asics GT2010', 'Adidas gazelle', 'Nike air max',
'Nike Lebron'], 'Unit sales':[1500, 1600,
2341, 1345, 4523, 2345, 1634, 3129]})
shoes_df['Brand'] = np.where(shoes_df['Product'].str.contains('Nike.*|Adidas.*').any(), 'Nike|Adidas', np.nan)
print(shoes_df)
Here was my attempt at doing the "for i in range" approach, which did not work either. Here, I got the error "TypeError: 'Series' objects are mutable, thus they cannot be hashed"
shoes_df = pd.DataFrame({'Product':['Nike vaporfly', 'Nike Jordans', 'Adidas supernova', 'Asics Kayano', 'Asics GT2010', 'Adidas gazelle', 'Nike air max',
'Nike Lebron'], 'Unit sales':[1500, 1600, 2341, 1345, 4523,
2345, 1634, 3129]})
for i in shoes_df.iterrows():
if shoes_df['Product'].str.contains('Nike').any():
shoes_df.set_value(i, 'Brand', 'Nike')
elif shoes_df['Product'].str.contains('Adidas').any():
shoes_df.set_value(i, 'Brand', 'Adidas')
elif shoes_df['Product'].str.contains('Asics').any():
shoes_df.set_value(i, 'Brand', 'Asics')
else:
shoes_df.set_value(i, 'Brand', np.nan)