2

If I have a following dataframe, I would like to clean data by replacing multiple strings and numbers into NaNs: ie. 68, Tardeo Road and 0 from state, 567 from dept, and #ERROR! and 123 from phonenumber:

   id                                state                          dept  \
0   1                            Abu Dhabi                   {Marketing}   
1   2                                   MO                       {Other}   
2   3                      68, Tardeo Road           {"Human Resources"}   
3   4  National Capital Territory of Delhi           {"Human Resources"}   
4   5                        Aargau Canton                   {Marketing}   
5   6                        Aargau Canton                           567   
6  18                                   NB  {"Finance & Administration"}   
7  19                                    0                       {Sales}   
8  20                            Abu Dhabi           {"Human Resources"}   
9  21                               Aargau  {"Finance & Administration"}   

   phonenumber  
0          123  
1   5635888000  
2  18006708450  
3      #ERROR!  
4  12032722596  
5  18003928343  
6          NaN  
7      #ERROR!  
8          NaN  
9          NaN

I have tried the following code:

Solution 1:

mask = (df.state == '0') | (df.state == '68, Tardeo Road')
df.loc[mask, ['state']] = np.nan

Solution 2:

df.loc[(df.state == '68, Tardeo Road') | (df.state == 0), 'state'] = np.nan

Solution 3:

df.loc[df.state == '0', 'state'] = np.nan
df.loc[df.state == '68, Tardeo Road', 'state'] = np.nan

All of them works, but if I apply them to multiple columns, it's a little bit long.

Just wondering if it's possible to make it more concise and efficient? By using str.replace for example. Thanks.

2
  • 1
    It should be | not &. How can a value be 0 and 68... at the same time? Commented Jun 2, 2020 at 3:07
  • Thanks, after retested, all of three solutions works. But if it possible to make it more concise? Especially when we have many colums. Commented Jun 2, 2020 at 3:17

1 Answer 1

2

You can do a replace:

df = df.replace({'state':['68, Tardeo Road','0'],
                 'dept':['567'],
                 'phonenumber':['#ERROR!','123']}, np.nan)

Output:

      id                                state                          dept    phonenumber
--  ----  -----------------------------------  ----------------------------  -------------
0   1     Abu Dhabi                            {Marketing}                             nan
1   2     MO                                   {Other}                          5635888000
2   3     nan                                  {"Human Resources"}             18006708450
3   4     National Capital Territory of Delhi  {"Human Resources"}                     nan
4   5     Aargau Canton                        {Marketing}                     12032722596
5   6     Aargau Canton                        nan                             18003928343
6   18    NB                                   {"Finance & Administration"}            nan
7   19    nan                                  {Sales}                                 nan
8   20    Abu Dhabi                            {"Human Resources"}                     nan
9   21    Aargau                               {"Finance & Administration"}            nan
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.