Replacing multiple values in a column of a dataframe Python

Question

Thousands of values need to be replaced by a simpler naming format. For example, the original dataframe's naming is AB5648, CD5678, EF5468, etc. and need to be replaced by HH_1, HH_2, HH_3 and so on, in accordance with the correspondence table I created. Correspondence table includes values to replace and to be replaced.

Original file = df_temp 

Filename = 'HH_number_Old.csv'
Filename = 'HH_number_New.csv'

Old                     New
AB1321                 HH_1
CD5678                 HH_2
EF5468                 HH_3
EF5468                 HH_3
EF5438                 HH_4
EF5368                 HH_5
EF5068                 HH_6
EF5468                 HH_7
EF5458                 HH_8
EF5168                 HH_9
.....                 .....
XZ5465                HH_3000

Here's what I tried.

for i in range (3000):
    print(HH_number_old[i])
    print(HH_number_new[i])

    temp_fin = df_temp.replace({HH_contract[i], HH_no[i]}, inplace=True) 
          #temp_fin is the resultant dataframe with replaced values

Result = temp_fin file is empty.

Replacing works when I try a specific number of [i] as below.

temp_fin = df_temp.replace (HH_number_old[1], HH_number_new[1])

@jezreal, yes thank you. But how to do it using the correspondence table, which is another csv. The reason I am trying to use the table is the naming needs to be consistent among several files. If in one file AB1234 is named HH_1, it should be the same in the others as well. — Amarbold Altangerel
– Amarbold Altangerel, Commented Nov 11, 2019 at 7:16

jezrael · Accepted Answer · 2019-11-11 08:38:08Z

2

Use Series.rank:

df['new'] = 'HH_' + df['To_be_replaced'].rank(method='dense').astype(int).astype(str)

Or GroupBy.ngroup:

df['new'] = 'HH_' + df.groupby('To_be_replaced', sort=False).ngroup().add(1).astype(str)

print (df)
  To_be_replaced To_replace   new
0         AB1321       HH_1  HH_1
1         CD5678       HH_2  HH_2
2         EF5468       HH_3  HH_3
3         EF5468       HH_3  HH_3
4         EF5468       HH_3  HH_3
5         EF5468       HH_3  HH_3
6         EF5468       HH_3  HH_3
7         EF5468       HH_3  HH_3
8         EF5468       HH_3  HH_3
9         EF5468       HH_3  HH_3

EDIT:

For replace multiple another DataFrames use:

d = dict(zip(df['To_be_replaced'], df['new']))

And then Series.map in another DataFrames:

df1['new'] = df1['To_be_replaced'].map(d)
df2['new'] = df2['To_be_replaced'].map(d)

edited Nov 11, 2019 at 8:38

answered Nov 11, 2019 at 6:48

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

qxzsilver · Accepted Answer · 2019-12-07 20:29:20Z

I see that EF5468 is being mapped according to your question as both HH_3 and HH_7. I am guessing that this mapping should be unique (importing this as a DataFrame and using dictionary comprehension should create a unique key-value pair).

You can simply use a map for this:

mapping_dict = {
'AB1321':                'HH_1', 
'CD5678':                'HH_2', 
'EF5468':                'HH_3',
'EF5438':                'HH_4',
'EF5368':                'HH_5',
'EF5068':                'HH_6',
'EF5458':                'HH_7',
'EF5168':                'HH_8'

df['new'] = df['old'].map(mapping_dict)

This should achieve the results you want, assuming that I understood your question correctly (with each ID only occurring once), and there exists a bijective (i.e. one-to-one and onto) mapping from the old ID to new ID.

Collectives™ on Stack Overflow

Replacing multiple values in a column of a dataframe Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related