0

My data is a list of products in a dataframe with other sales and ordering information.

product_cat_dict = {"T-Shirt": "T-Shirt",
                   "Top": "T-Shirt",
                   "Vest": "T-Shirt",
                   "Sweater": "Sweater"}

products = pd.DataFrame({"Product Name": ["T-Shirt White", "T-Shirt Black", "Top Orange", "Navy Vest", "Red Top", "Sweater Black"],
                        "Sales": [100, 200, 250, 50, 150, 300]}) 

enter image description here

I'm trying to add a new column onto the dataframe which contains just the product category from the product name column but I'd also like some of the products to be grouped together into the same category (as per the dictionary code).

My desired result is the following table:

enter image description here

I tried using a dictionary so it's easy to update in case any new products with undefined categories are added to the data. From reading other SO posts it looks like I need to use contains to do partial substring matching but I can't seem to return the actual matched value (rather than the original data). The best I could get was to return a list of Boolean responses with the below code.

products["Product Name"].str.contains("|".join(product_cat_dict.keys()))

Any help on how I can get to my desired result would be much appreciated.

2 Answers 2

2

We could use list comprehension to find the key of the category dictionary in the product name.

import pandas as pd 

product_cat_dict = {"T-Shirt": "T-Shirt",
                   "Top": "T-Shirt",
                   "Vest": "T-Shirt",
                   "Sweater": "Sweater"}

products = pd.DataFrame({"Product Name": ["T-Shirt White", "T-Shirt Black", "Top Orange", "Navy Vest", "Red Top", "Sweater Black"],
                        "Sales": [100, 200, 250, 50, 150, 300]}) 

products['category'] = products['Product Name'].apply(lambda name: [v for k, v in product_cat_dict.items() if k in name][0])

print(products)
Sign up to request clarification or add additional context in comments.

2 Comments

Spot on! Thanks a lot, that's exactly what I was looking for. Although I think the first k in the list comprehension should be a y because I wanted to return the value rather than the key from the dictionary
I've changed the code to return the value rather than the key.
0

Assuming that the key is either the first word or the second one, you could split and try to get any of the keys from the dicitionary, this is not efficient but will probably work OK for a resonable size of data

products['pn1'] = [x.split()[0] for x in products['Product Name']]
products['pn2'] = [x.split()[1] for x in products['Product Name']]
products['Product Category'] = \
[product_cat_dict.get(x[0], product_cat_dict.get(x[1])) 
for x in zip(products['pn1'], products['pn2'])]

1 Comment

Thanks for your response, the product name column in my data would sometimes have a description larger than 2 words and in different orders, I probs should have put that in the OP. Norie has provided a solution that was able to extract the info I needed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.