0

I have this dataframe (sample):

d = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB']}
df = pd.DataFrame(data=d)

The strings in Col3 need to be replaced with a value where this value is consistent over all rows for the strings that are the same. The output should look like a dataframe of this kind with Col3_nw:

d_target = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB], 'Col3_nw': [1, 2, 1, 2, 2, 2, 3, 1, 4, 3, 2]}
df_target = pd.DataFrame(data=d_target)

I tried to sort the dataframe on Col3 and loop through it, adding the same counter value while the Col3 value is the same, and adding 1 to the counter value when a new Col3 value shows up.

df_temp = df.sort_values('Col3')

String_value = df_temp['Col3'].iloc[0]

Counter = 1
df_temp.loc[0,'Col3_nw'] = Counter

for index, row in df_temp.iterrows():
    if df_temp.iloc[index,'Col3'] == String_value:
       df_temp.loc[index,'Col3_nw'] = Counter
    if df_temp.iloc[index,'Col3'] != String_value:     
        Counter = Counter + 1
        df_temp.loc[index,'Col3_nw'] = Counter

This code provides this error which I do not understand:

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Can anybody help me getting this to work in Python?

Thanks so much!

1 Answer 1

1

you can pass a dictionary to replace:

d = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 
     'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 
     'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB']}

df = pd.DataFrame(data=d)

df['Col3'] = df.Col3.replace({'AA': 1, 'BB': 2, 'CC': 3, 'DD': 4})

Edit: Since you're just trying to map the strings to integers just convert the column to a categorical and use the associated codes:

df['Col3'] = df.Col3.astype('categorical').cat.codes
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! The values in Col3 have many possible values so it is not possible to hardcode the dictionary, how would I create the dictionary automatically?
Are you just trying to map each unique string to an integer? How are you arriving at the values that each string maps to?
I am trying to map each unique string to an integer, it does not matter what integer as long as each time a string occurs it refers to the same integer and no other strings refer to that same integer
in that case you can just covert the column to a categorical and use the codes that that generates. See my edit

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.