1

I have a dataframe as follows:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     %^&**##_               15.01.2018         300.05
003     11.12.2018             _%67*              7*7%

As we can see that except ID column I have special character in all columns.

Objective: To replace those special character by None. So the final dataframe should look like:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     Null                   15.01.2018         300.05
003     11.12.2018             Null               Null

Then as a next step I want parse the Date columns to YYYY-MM-DD format.

In order to accomplish this I am using the following code snippet:

for c in df.columns.tolist():
  df[c] = df[c].astype(str).str.replace(r"[^A-Za-z0-9]"," ")
df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')
df['Date_delivery'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')

But the above code is just not working!!! Even if I am trying to replace, it is not working.

Am I missing out anything?

P.S.: I have tried in SO - > this and this but so far no luck

5
  • There are another columns? If yes, how should be processing? All another columns to numbers? Commented Oct 28, 2020 at 12:45
  • @jezrael: just adding regex=True in str.replace will do the trick. But I need to convert the date fields as I wanted i.e. YYYY-MM-DD format. Commented Oct 28, 2020 at 12:49
  • But I need to convert the date fields as I wanted i.e. YYYY-MM-DD format. - Not sure if understand now not working? Commented Oct 28, 2020 at 12:56
  • yes it is working. Just for my understanding, if I do a pd.read_csv(file,parse_dates = date_cols), will it do this job as you have explained? I do not care about Value field as of now. Commented Oct 28, 2020 at 13:03
  • Unfortuantely not only, need custom function, give me a sec. Commented Oct 28, 2020 at 13:03

1 Answer 1

0

You can specify fomrat of datetimes of input data, here DD.MM.YYYY by '%d.%m.%Y' and for convert numbers use to_numeric:

 #for processing all columns
 df = df.astype(str).replace(r"[^A-Za-z0-9]","", regex=True)

df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],errors='coerce',format='%d.%m.%Y')
df['Date_delivery'] = pd.to_datetime(df['Date_delivery'],errors='coerce',format='%d.%m.%Y')

df['Value'] = pd.to_numeric(df['Value'],errors='coerce')
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT     NaN

print (df.dtypes)
ID                        int64
Date_Loading     datetime64[ns]
Date_delivery    datetime64[ns]
Value                   float64
dtype: object

EDIT:

dateparse = lambda x: pd.to_datetime(x, format='%d.%m.%Y', errors='coerce',)

df = pd.read_csv(file, parse_dates=['Date_Loading','Date_delivery'], date_parser=dateparse)
    
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT    7*7%
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.