0

The first code works, the second code block gives no Error but doesn't give the result I expected.

First code creates a new column ['Type']. Names of equal stores but with different names are binned in column['Type']. So: shop name A and Shop name B, are in column ['Naam']. The script labels both as 'Supermarket'in column ['Type']. So far so good.

The second block of code is supposed to lable every store / shop etc. that is not named in the Namendict.test dictionary. I want these not recognised shop / stores etc. labeld as ['Diversen']. Hope someone has a suggestion. Thanks!

1: working code:

from Namendict import test

for value in df['Naam']:
     for i, (k,v) in enumerate(test.items()):          
        boolean_indexer = df['Naam'].str.contains(k)
        df.loc[boolean_indexer, 'Type'] = (v) 

2: supposed to work code ( no Error, but also no Diversen in column ['Type'], just NaN):

from Namendict import test

for value in df['Naam']:
     for i, (k,v) in enumerate(test.items()):          
        boolean_indexer = df['Naam'].str.contains(k)
        if True:
            df.loc[boolean_indexer, 'Type'] = (v)
        else:
            df.loc[boolean_indexer, 'Type'] = ('Diversen.') 

Many thanks. Janneman

2
  • 1
    if True: always evaluates to True, so Diversen never gets set. Think you might need to add some criteria to checking your condition. Commented Jul 14, 2020 at 13:44
  • Oops.... your right of course. What was I thinking... Thanks! Commented Jul 14, 2020 at 13:59

1 Answer 1

1

There are multiple options to tackle this problem. First option is just to replace the 'NaN' values afterwards with 'Diverse' with the fillna function of pandas. This looks as follows:

from Namendict import test

# Looping over all existing records in the dict
for k,v in test.items():          
   boolean_indexer = df['Naam'].str.contains(k)
   df.loc[boolean_indexer, 'Type'] = v

# Filling in all empty ("nan") values with "Diversen."
df['Type'] = df['Type'].fillna("Diversen.")

Another option is to check if the name exists in the 'test' dictionary. If so, the 'type' stored in the dictionary can be put in the DataFrame. This loops over all unique names in the column instead over all the values. This makes sure you don't execute multiple times the same action.

from Namendict import test

for naam in df['Naam'].unique(): # Loop over all unique names in DataFrame
    boolean_indexer = df['Naam'].str.contains(naam)

if naam in test.keys(): # Check if the name allready excist in dict
    # If True --> get type from the dictionary     
    df.loc[boolean_indexer, 'Type'] = test[naam] 
else:
    # If False --> fill in 'Diversen.' 
    df.loc[boolean_indexer, 'Type'] = "Diversen."  
Sign up to request clarification or add additional context in comments.

1 Comment

hey Theek. thanks! first solution obviously works. simple yet working solution. Your second suggestion doesn't work unfortunally. It sets almost all rows to Diversen in ['Type']. I think I will simply use fillna. Works all the time. thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.