I am in the process of developing a small script which allows me to retrieve the values of a dictionary if the condition is met.
if the condition is fulfilled I retrieve the key of the dictionary in the cell of my columns in my dataframe.
However, where I get stuck is that I can only retrieve one value from my dictionary while my initial values validate other conditions.
what i have :
| Name | shopping list | cat_and_subcat |
|---|---|---|
| tom | apple , sirop , carotte | Fruit - Apple |
| nick | chocolate, banana, apple minie | Cake - Oreo |
| julie | juice | Fruit - Lemon |
what i should have :
| Name | shopping list | cat_and_subcat |
|---|---|---|
| tom | apple , sirop , carotte | Fruit-Apple , Cake-Carote cake |
| nick | chocolate, banana, apple minie | Cake - Oreo , Fruit - Apple |
| julie | juice | Fruit - Lemon |
How do I get to return all the values of the conditions that are true in the same cell?
# Import library
import pandas as pd
import re
# initialize list of lists
data = [
["tom", "apple , sirop , carotte"],
["nick", "chocolate, banana, apple minie"],
["julie", "juice"],
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=["Name", "shopping list"])
# print dataframe.
df
def create_dict():
# Separator
sep = " - "
# cat
fruit = "Fruit"
cake = "Cake"
# sub cat
apple = "Apple"
lemon = "Lemon"
oreo = "Oreo"
carote_cake = "Carote cake"
category_uri_dict_reg_match = {
fruit: {
apple: ["apple", "apple minie"],
lemon: ["juice"],
},
cake: {
carote_cake: ["carotte", "betacarotten"],
oreo: ["chocolate", "crunchy cake"],
},
}
# compile regexp one for all for performance matters
category_dict_reg_match = {}
for cat, cat_dict in category_uri_dict_reg_match.items():
category_dict_reg_match[cat] = {}
for sub_cat, raw_reg_list in cat_dict.items():
reg_list = []
for raw_regex in raw_reg_list:
reg_list.append(re.compile(raw_regex))
# print(reg_list)
category_dict_reg_match[cat][sub_cat] = reg_list
return category_dict_reg_match
dictio = create_dict()
def get_cat_and_subcat(topic):
topic = re.sub("", "", topic)
for cat, cat_dict in dictio.items():
for sub_cat, reg_list in cat_dict.items():
if any(compiled_reg.match(topic) for compiled_reg in reg_list):
return cat + sep + sub_cat
return "NO_MATCH"
df["cat_and_subcat"] = df.apply(
lambda x: get_cat_and_subcat(x["shopping list"]), axis=1
)
df