Modify pandas data frame values based on condition

Question

Given a data frame with 9,000+ records (rows), a column with unique IDs, and each ID can have multiple records as shown below.

df.head(4)

Unique_ID	Record_1	Record_2
AN5001	90.0	ten
AN5002	90.0	five
AN5001	95.0	five
AN5003	60.0	forty

There are 360 unique IDs. However, about half of them need to be corrected. Consider below df_corrected_ID.head(3)

Unique_ID_old	Unique_ID_new
AN5001	AN5010
AN5002	AN5002
AN5003	AN5011

How would you, most efficiently, fix the Unique ID in the main df with 9,000+ records using the df_corrected_ID data frame?

So, check if Unique_ID_old is present in df['Unique_ID'] column and if it is, replace it with Unique_ID_new from df_corrected_ID.

How would one then check change occurred correctly (for example, just show the difference between the Unique_IDs -- say after converting the original and updated columns to lists and then, list(set(Unique_ID) - set(Unique_ID_new)).

It's okay to add another new column to original df if needed with corrected IDs, as long as the order is maintained an none of the records are changed.

Thanks!

crashMOGWAI · Accepted Answer · 2022-12-09 19:54:11Z

It is simple mapping. You can add a column or update in place.

df_map = pd.DataFrame(dict(uni=['AN5001','AN5002','AN5003'], uni_update=['AN5010','AN5002','AN5011'])) 
df_record = pd.DataFrame(dict(uni=['AN5001','AN5002','AN5001','AN5003'], rec1=[1,2,3,4], rec2=['a','b','c','d']))
    
df_record['uni_update'] = df_record.uni.map(dict(zip(df_map.uni, df_map.uni_update)))

df w/ new col

|    | uni    |   rec1 | rec2   | uni_update   |
|---:|:-------|-------:|:-------|:-------------|
|  0 | AN5001 |      1 | a      | AN5010       |
|  1 | AN5002 |      2 | b      | AN5002       |
|  2 | AN5001 |      3 | c      | AN5010       |
|  3 | AN5003 |      4 | d      | AN5011       |

to check afterwards

df_record_reduced = df_record.drop_duplicates(subset='uni').loc[:,['uni','uni_update']].reset_index(drop=True)

|    | uni    | uni_update   |
|---:|:-------|:-------------|
|  0 | AN5001 | AN5010       |
|  1 | AN5002 | AN5002       |
|  2 | AN5003 | AN5011       |

df_map.equals(df_record_reduced) #should be true if everything is aligned

jprebys · Accepted Answer · 2022-12-09 19:30:00Z

1

You can use Series.replace for this. Make your corrected ID DataFrame into a dictionary that maps old IDs to new ones, and you can use it to update your existing column or create a new one.

update_dict = dict(zip(df_corrected_ID['Unique_ID_old'], df_corrected_ID['Unique_ID_new'])

# to create a new column
df['Unique_ID_updated'] = df['Unique_ID'].replace(update_dict)

# to update the existing column
df['Unique_ID'] = df['Unique_ID'].replace(update_dict)

answered Dec 9, 2022 at 19:30

jprebys

2,5261 gold badge13 silver badges17 bronze badges

Collectives™ on Stack Overflow

Modify pandas data frame values based on condition

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related