2

Thousands of values need to be replaced by a simpler naming format. For example, the original dataframe's naming is AB5648, CD5678, EF5468, etc. and need to be replaced by HH_1, HH_2, HH_3 and so on, in accordance with the correspondence table I created. Correspondence table includes values to replace and to be replaced.

Original file = df_temp 

Filename = 'HH_number_Old.csv'
Filename = 'HH_number_New.csv'

Old                     New
AB1321                 HH_1
CD5678                 HH_2
EF5468                 HH_3
EF5468                 HH_3
EF5438                 HH_4
EF5368                 HH_5
EF5068                 HH_6
EF5468                 HH_7
EF5458                 HH_8
EF5168                 HH_9
.....                 .....
XZ5465                HH_3000

Here's what I tried.

for i in range (3000):
    print(HH_number_old[i])
    print(HH_number_new[i])

    temp_fin = df_temp.replace({HH_contract[i], HH_no[i]}, inplace=True) 
          #temp_fin is the resultant dataframe with replaced values

Result = temp_fin file is empty.

Replacing works when I try a specific number of [i] as below.

temp_fin = df_temp.replace (HH_number_old[1], HH_number_new[1])

3
  • My solution working? Commented Nov 11, 2019 at 7:09
  • 1
    @jezreal, yes thank you. But how to do it using the correspondence table, which is another csv. The reason I am trying to use the table is the naming needs to be consistent among several files. If in one file AB1234 is named HH_1, it should be the same in the others as well. Commented Nov 11, 2019 at 7:16
  • My answer was edited. Commented Nov 11, 2019 at 8:38

2 Answers 2

2

Use Series.rank:

df['new'] = 'HH_' + df['To_be_replaced'].rank(method='dense').astype(int).astype(str)

Or GroupBy.ngroup:

df['new'] = 'HH_' + df.groupby('To_be_replaced', sort=False).ngroup().add(1).astype(str)

print (df)
  To_be_replaced To_replace   new
0         AB1321       HH_1  HH_1
1         CD5678       HH_2  HH_2
2         EF5468       HH_3  HH_3
3         EF5468       HH_3  HH_3
4         EF5468       HH_3  HH_3
5         EF5468       HH_3  HH_3
6         EF5468       HH_3  HH_3
7         EF5468       HH_3  HH_3
8         EF5468       HH_3  HH_3
9         EF5468       HH_3  HH_3

EDIT:

For replace multiple another DataFrames use:

d = dict(zip(df['To_be_replaced'], df['new']))

And then Series.map in another DataFrames:

df1['new'] = df1['To_be_replaced'].map(d)
df2['new'] = df2['To_be_replaced'].map(d)
Sign up to request clarification or add additional context in comments.

Comments

1

I see that EF5468 is being mapped according to your question as both HH_3 and HH_7. I am guessing that this mapping should be unique (importing this as a DataFrame and using dictionary comprehension should create a unique key-value pair).

You can simply use a map for this:

mapping_dict = {
'AB1321':                'HH_1', 
'CD5678':                'HH_2', 
'EF5468':                'HH_3',
'EF5438':                'HH_4',
'EF5368':                'HH_5',
'EF5068':                'HH_6',
'EF5458':                'HH_7',
'EF5168':                'HH_8'

df['new'] = df['old'].map(mapping_dict)

This should achieve the results you want, assuming that I understood your question correctly (with each ID only occurring once), and there exists a bijective (i.e. one-to-one and onto) mapping from the old ID to new ID.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.