0
     Date                      

    2021/8/1
    8-5-2021
    8-6-2021:08:00:00 PM

I would like all the values in this column to be in the format yyyy-m-dd.

This is what I have been trying but it gives me an error saying unknown string format:

df['Date']= pd.to_datetime(df['Date'].str.split(':', n=1).str[0])
2
  • The format for each line is different. It's very difficult to say which is the Day and which is the Month for each one of them. Would be usefull if you would provide a list of all possible formats that you have as input. Then a solution can be found (it doesn't matter how they are splitted, but the order is important(ex: ddmmyyyy, mmddyyyy, yyyyddmm, yyyymmdd, yyddmm, etc.) Try to provide the entire list of possible formats. Commented Sep 15, 2021 at 20:49
  • If you can develop a complete accounting of the data formats in this data, you might have to selectively fix the malformed elements before trying to parse. Alternatively, you can use something like Pendulum, which has a very liberal date/time parser, which you can invoke by .applying over the series with the dates. You can then convert the Pendulum DateTime objects back to "plain" datetime.datetime objects, or emit strings in a consistent format, etc. Commented Sep 15, 2021 at 20:51

2 Answers 2

2

You could apply a custom function on your Date column to parse the date values.

import pandas as pd
import io
from datetime import datetime

temp_data=u"""Date
8/1/2021
8-5-2021
8-6-2021:08:00:00 PM
"""

data = pd.read_csv(io.StringIO(temp_data), sep=";", parse_dates=False)

def to_date(string_date):
    formats=["%d/%m/%Y","%d-%m-%Y","%d-%m-%Y:%I:%M:%S %p"]
    parsed_date=None
    for format in formats:
        try:
            parsed_date=datetime.strptime('8-6-2021:08:00:00 PM', "%d-%m-%Y:%I:%M:%S %p").date()
            return parsed_date
        except ValueError:
            pass
    raise RuntimeError(f"Unable to parse date {string_date} with available formats={formats}")

You can add new formats in the to_date function to parse any new date format.

data['Date']=data['Date'].apply(lambda row: to_date(row))

>>> data
        Date
0  2021-06-08
1  2021-06-08
2  2021-06-08
Sign up to request clarification or add additional context in comments.

Comments

0

Make use of the format option. For more info, check the to_datetime documentation.

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')

1 Comment

Hi, please see the edit. I have different formats of data, its not always in the ymd format. I am not sure what to change

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.