If I have a following dataframe, I would like to clean data by replacing multiple strings and numbers into NaNs: ie. 68, Tardeo Road and 0 from state, 567 from dept, and #ERROR! and 123 from phonenumber:
id state dept \
0 1 Abu Dhabi {Marketing}
1 2 MO {Other}
2 3 68, Tardeo Road {"Human Resources"}
3 4 National Capital Territory of Delhi {"Human Resources"}
4 5 Aargau Canton {Marketing}
5 6 Aargau Canton 567
6 18 NB {"Finance & Administration"}
7 19 0 {Sales}
8 20 Abu Dhabi {"Human Resources"}
9 21 Aargau {"Finance & Administration"}
phonenumber
0 123
1 5635888000
2 18006708450
3 #ERROR!
4 12032722596
5 18003928343
6 NaN
7 #ERROR!
8 NaN
9 NaN
I have tried the following code:
Solution 1:
mask = (df.state == '0') | (df.state == '68, Tardeo Road')
df.loc[mask, ['state']] = np.nan
Solution 2:
df.loc[(df.state == '68, Tardeo Road') | (df.state == 0), 'state'] = np.nan
Solution 3:
df.loc[df.state == '0', 'state'] = np.nan
df.loc[df.state == '68, Tardeo Road', 'state'] = np.nan
All of them works, but if I apply them to multiple columns, it's a little bit long.
Just wondering if it's possible to make it more concise and efficient? By using str.replace for example. Thanks.
|not&. How can a value be0and68...at the same time?