Replace multiple strings and numbers from multiple columns with NaN in Pandas

Question

If I have a following dataframe, I would like to clean data by replacing multiple strings and numbers into NaNs: ie. 68, Tardeo Road and 0 from state, 567 from dept, and #ERROR! and 123 from phonenumber:

   id                                state                          dept  \
0   1                            Abu Dhabi                   {Marketing}   
1   2                                   MO                       {Other}   
2   3                      68, Tardeo Road           {"Human Resources"}   
3   4  National Capital Territory of Delhi           {"Human Resources"}   
4   5                        Aargau Canton                   {Marketing}   
5   6                        Aargau Canton                           567   
6  18                                   NB  {"Finance & Administration"}   
7  19                                    0                       {Sales}   
8  20                            Abu Dhabi           {"Human Resources"}   
9  21                               Aargau  {"Finance & Administration"}   

   phonenumber  
0          123  
1   5635888000  
2  18006708450  
3      #ERROR!  
4  12032722596  
5  18003928343  
6          NaN  
7      #ERROR!  
8          NaN  
9          NaN

I have tried the following code:

Solution 1:

mask = (df.state == '0') | (df.state == '68, Tardeo Road')
df.loc[mask, ['state']] = np.nan

Solution 2:

df.loc[(df.state == '68, Tardeo Road') | (df.state == 0), 'state'] = np.nan

Solution 3:

df.loc[df.state == '0', 'state'] = np.nan
df.loc[df.state == '68, Tardeo Road', 'state'] = np.nan

All of them works, but if I apply them to multiple columns, it's a little bit long.

Just wondering if it's possible to make it more concise and efficient? By using str.replace for example. Thanks.

It should be | not &. How can a value be 0 and 68... at the same time? — Henry Yik
– Henry Yik, Commented Jun 2, 2020 at 3:07
Thanks, after retested, all of three solutions works. But if it possible to make it more concise? Especially when we have many colums. — ah bon
– ah bon, Commented Jun 2, 2020 at 3:17

Quang Hoang · Accepted Answer · 2020-06-02 03:23:55Z

You can do a replace:

df = df.replace({'state':['68, Tardeo Road','0'],
                 'dept':['567'],
                 'phonenumber':['#ERROR!','123']}, np.nan)

Output:

      id                                state                          dept    phonenumber
--  ----  -----------------------------------  ----------------------------  -------------
0   1     Abu Dhabi                            {Marketing}                             nan
1   2     MO                                   {Other}                          5635888000
2   3     nan                                  {"Human Resources"}             18006708450
3   4     National Capital Territory of Delhi  {"Human Resources"}                     nan
4   5     Aargau Canton                        {Marketing}                     12032722596
5   6     Aargau Canton                        nan                             18003928343
6   18    NB                                   {"Finance & Administration"}            nan
7   19    nan                                  {Sales}                                 nan
8   20    Abu Dhabi                            {"Human Resources"}                     nan
9   21    Aargau                               {"Finance & Administration"}            nan

Collectives™ on Stack Overflow

Replace multiple strings and numbers from multiple columns with NaN in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related