Python: Replace the special character by NULL in each column in pandas dataframe

Question

I have a dataframe as follows:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     %^&**##_               15.01.2018         300.05
003     11.12.2018             _%67*              7*7%

As we can see that except ID column I have special character in all columns.

Objective: To replace those special character by None. So the final dataframe should look like:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     Null                   15.01.2018         300.05
003     11.12.2018             Null               Null

Then as a next step I want parse the Date columns to YYYY-MM-DD format.

In order to accomplish this I am using the following code snippet:

for c in df.columns.tolist():
  df[c] = df[c].astype(str).str.replace(r"[^A-Za-z0-9]"," ")
df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')
df['Date_delivery'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')

But the above code is just not working!!! Even if I am trying to replace, it is not working.

Am I missing out anything?

P.S.: I have tried in SO - > this and this but so far no luck

There are another columns? If yes, how should be processing? All another columns to numbers? — jezrael
– jezrael, Commented Oct 28, 2020 at 12:45
@jezrael: just adding regex=True in str.replace will do the trick. But I need to convert the date fields as I wanted i.e. YYYY-MM-DD format. — pythondumb
– pythondumb, Commented Oct 28, 2020 at 12:49
But I need to convert the date fields as I wanted i.e. YYYY-MM-DD format. - Not sure if understand now not working? — jezrael
– jezrael, Commented Oct 28, 2020 at 12:56
yes it is working. Just for my understanding, if I do a pd.read_csv(file,parse_dates = date_cols), will it do this job as you have explained? I do not care about Value field as of now. — pythondumb
– pythondumb, Commented Oct 28, 2020 at 13:03
Unfortuantely not only, need custom function, give me a sec. — jezrael
– jezrael, Commented Oct 28, 2020 at 13:03

jezrael · Accepted Answer · 2020-10-28 13:07:25Z

You can specify fomrat of datetimes of input data, here DD.MM.YYYY by '%d.%m.%Y' and for convert numbers use to_numeric:

 #for processing all columns
 df = df.astype(str).replace(r"[^A-Za-z0-9]","", regex=True)

df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],errors='coerce',format='%d.%m.%Y')
df['Date_delivery'] = pd.to_datetime(df['Date_delivery'],errors='coerce',format='%d.%m.%Y')

df['Value'] = pd.to_numeric(df['Value'],errors='coerce')
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT     NaN

print (df.dtypes)
ID                        int64
Date_Loading     datetime64[ns]
Date_delivery    datetime64[ns]
Value                   float64
dtype: object

EDIT:

dateparse = lambda x: pd.to_datetime(x, format='%d.%m.%Y', errors='coerce',)

df = pd.read_csv(file, parse_dates=['Date_Loading','Date_delivery'], date_parser=dateparse)
    
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT    7*7%

Collectives™ on Stack Overflow

Python: Replace the special character by NULL in each column in pandas dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related