0

I am in the process of developing a small script which allows me to retrieve the values ​​of a dictionary if the condition is met.

if the condition is fulfilled I retrieve the key of the dictionary in the cell of my columns in my dataframe.

However, where I get stuck is that I can only retrieve one value from my dictionary while my initial values ​​validate other conditions.

what i have :

Name shopping list cat_and_subcat
tom apple , sirop , carotte Fruit - Apple
nick chocolate, banana, apple minie Cake - Oreo
julie juice Fruit - Lemon

what i should have :

Name shopping list cat_and_subcat
tom apple , sirop , carotte Fruit-Apple , Cake-Carote cake
nick chocolate, banana, apple minie Cake - Oreo , Fruit - Apple
julie juice Fruit - Lemon

How do I get to return all the values ​​of the conditions that are true in the same cell?

# Import  library
import pandas as pd
import re

# initialize list of lists
data = [
    ["tom", "apple , sirop , carotte"],
    ["nick", "chocolate, banana, apple minie"],
    ["julie", "juice"],
]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns=["Name", "shopping list"])

# print dataframe.
df


def create_dict():
    # Separator
    sep = " - "
    # cat
    fruit = "Fruit"
    cake = "Cake"
    # sub cat
    apple = "Apple"
    lemon = "Lemon"
    oreo = "Oreo"
    carote_cake = "Carote cake"

    category_uri_dict_reg_match = {
        fruit: {
            apple: ["apple", "apple minie"],
            lemon: ["juice"],
        },
        cake: {
            carote_cake: ["carotte", "betacarotten"],
            oreo: ["chocolate", "crunchy cake"],
        },
    }

    # compile regexp one for all for performance matters
    category_dict_reg_match = {}
    for cat, cat_dict in category_uri_dict_reg_match.items():
        category_dict_reg_match[cat] = {}
        for sub_cat, raw_reg_list in cat_dict.items():
            reg_list = []
            for raw_regex in raw_reg_list:
                reg_list.append(re.compile(raw_regex))
            # print(reg_list)
            category_dict_reg_match[cat][sub_cat] = reg_list
    return category_dict_reg_match


dictio = create_dict()


def get_cat_and_subcat(topic):
    topic = re.sub("", "", topic)
    for cat, cat_dict in dictio.items():
        for sub_cat, reg_list in cat_dict.items():
            if any(compiled_reg.match(topic) for compiled_reg in reg_list):
                return cat + sep + sub_cat
    return "NO_MATCH"


df["cat_and_subcat"] = df.apply(
    lambda x: get_cat_and_subcat(x["shopping list"]), axis=1
)

df
1
  • Your code specifically returns as soon as it finds the first match. You have to accumulate the matches in a list or other structure. When you finish the loop, then you return the list. Look for a secondary tutorial on functions, or simply search how to return multiple values. Commented Feb 25, 2021 at 18:18

1 Answer 1

1

match tries to find a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default) in Python. In addition, your code returns instantly if it finds a match. You can modify your code as follows:

def get_cat_and_subcat(topic):
    sep = " - "
    topic = re.sub("", "", topic)
    results = []
    for cat, cat_dict in dictio.items():
        for sub_cat, reg_list in cat_dict.items():
            if any(compiled_reg.search(topic) for compiled_reg in reg_list):
                results.append(cat + sep + sub_cat)
    if 0 == len(results):
        return "NO_MATCH"
    return ", ".join(results)

Result:

    Name                   shopping list
0    tom         apple , sirop , carotte
1   nick  chocolate, banana, apple minie
2  julie                           juice
    Name                   shopping list                     cat_and_subcat
0    tom         apple , sirop , carotte  Fruit - Apple, Cake - Carote cake
1   nick  chocolate, banana, apple minie         Fruit - Apple, Cake - Oreo
2  julie                           juice                      Fruit - Lemon
Sign up to request clarification or add additional context in comments.

1 Comment

it works great, the mistake I made was forgetting the append function and a bad use of .match(). it will not happen twice thank you very much for your intervention. It helps me enormously to understand.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.