How can I replace repeating string in dataframe column with a different value (repeating strings should have the same new value)

Question

I have this dataframe (sample):

d = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB']}
df = pd.DataFrame(data=d)

The strings in Col3 need to be replaced with a value where this value is consistent over all rows for the strings that are the same. The output should look like a dataframe of this kind with Col3_nw:

d_target = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB], 'Col3_nw': [1, 2, 1, 2, 2, 2, 3, 1, 4, 3, 2]}
df_target = pd.DataFrame(data=d_target)

I tried to sort the dataframe on Col3 and loop through it, adding the same counter value while the Col3 value is the same, and adding 1 to the counter value when a new Col3 value shows up.

df_temp = df.sort_values('Col3')

String_value = df_temp['Col3'].iloc[0]

Counter = 1
df_temp.loc[0,'Col3_nw'] = Counter

for index, row in df_temp.iterrows():
    if df_temp.iloc[index,'Col3'] == String_value:
       df_temp.loc[index,'Col3_nw'] = Counter
    if df_temp.iloc[index,'Col3'] != String_value:     
        Counter = Counter + 1
        df_temp.loc[index,'Col3_nw'] = Counter

This code provides this error which I do not understand:

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Can anybody help me getting this to work in Python?

Thanks so much!

dan_g · Accepted Answer · 2019-03-27 21:32:51Z

1

you can pass a dictionary to replace:

d = {'Col1': [10, 10, 20, 38, 10, 100, 45, 8, 18, 30, 10], 
     'Col2': [1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0], 
     'Col3': ['AA', 'BB', 'AA', 'BB', 'BB', 'BB', 'CC', 'AA', 'DD', 'CC', 'BB']}

df = pd.DataFrame(data=d)

df['Col3'] = df.Col3.replace({'AA': 1, 'BB': 2, 'CC': 3, 'DD': 4})

Edit: Since you're just trying to map the strings to integers just convert the column to a categorical and use the associated codes:

df['Col3'] = df.Col3.astype('categorical').cat.codes

edited Mar 27, 2019 at 21:32

answered Mar 26, 2019 at 20:41

dan_g

2,8156 gold badges30 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Suzanne Over a year ago

Thanks! The values in Col3 have many possible values so it is not possible to hardcode the dictionary, how would I create the dictionary automatically?

dan_g Over a year ago

Are you just trying to map each unique string to an integer? How are you arriving at the values that each string maps to?

Suzanne Over a year ago

I am trying to map each unique string to an integer, it does not matter what integer as long as each time a string occurs it refers to the same integer and no other strings refer to that same integer

dan_g Over a year ago

in that case you can just covert the column to a categorical and use the codes that that generates. See my edit

Collectives™ on Stack Overflow

How can I replace repeating string in dataframe column with a different value (repeating strings should have the same new value)

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related