Pandas replace multiple values at once

Question

I'm attempting to clean up some of the Data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem that I'm encountering is that some of the cities are misspelled which distorts the information and makes it difficult for further processing.

  SURNAME   | ADDRESS          | CITY
0 Jenson    | 252 Des Chênes   | D.DO
1 Jean      | 236 Gouin        | DOLLARD
2 Denis     | 993 Boul. Gouin  | DOLLARD-DES-ORMEAUX
3 Bradford  | 1690 Dollard #7  | DDO
4 Alisson   | 115 Du Buisson   | IL PERROT
5 Abdul     | 9877 Boul. Gouin | Pierrefonds
6 O'Neil    | 5 Du College     | Ile Bizard
7 Bundy     | 7345 Sherbrooke  | ILLE Perot
8 Darcy     | 8671 Anthony #2  | ILE Perrot
9 Adams     | 845 Georges      | Pierrefonds

In the above example D.DO, DOLLARD, DDO should be spelled DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be spelled ILE-PERROT.

I've been able to replace the values using:

df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) 
df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True)

Is there some way of combining the above operations into one? I've tried:

df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True)

but I've had no luck

MaxU - stand with Ukraine · Accepted Answer · 2016-03-17 22:51:03Z

26

try .replace({}, regex=True) method:

replacements = {
   'CITY': {
      r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX',
      r'I[lL]*[eE]*.*': 'ILLE Perot'}
}

df.replace(replacements, regex=True, inplace=True)

print(df)

Output:

    SURNAME           ADDRESS                 CITY
0    Jenson    252 Des Ch├¬nes  DOLLARD-DES-ORMEAUX
1      Jean         236 Gouin  DOLLARD-DES-ORMEAUX
2     Denis   993 Boul. Gouin  DOLLARD-DES-ORMEAUX
3  Bradford   1690 Dollard #7  DOLLARD-DES-ORMEAUX
4   Alisson    115 Du Buisson           ILLE Perot
5     Abdul  9877 Boul. Gouin          Pierrefonds
6    O'Neil      5 Du College           ILLE Perot
7     Bundy   7345 Sherbrooke           ILLE Perot
8     Darcy   8671 Anthony #2           ILLE Perot
9     Adams       845 Georges          Pierrefonds

edited Mar 17, 2016 at 22:51

answered Mar 17, 2016 at 22:42

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander · Accepted Answer · 2016-03-17 22:49:01Z

4

You can create a dictionary of replacements and then iterate through them, using 'loc' for replacement.

target_for_values = {
    'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'], 
    'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']}

for k, v in target_for_values.iteritems():
    df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k

>>> df.CITY
                  CITY
0                 C.DO
1  DOLLARD-DES-ORMEAUX
2  DOLLARD-DES-ORMEAUX
3  DOLLARD-DES-ORMEAUX
4           ILE-PERROT
5          Pierrefonds
6           Ile Bizard
7           ILE-PERROT
8           ILE-PERROT
9          Pierrefonds

answered Mar 17, 2016 at 22:49

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

Pandas replace multiple values at once

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related