Parsing dates to datetime on the fly in python using pandas

Question

I've got a dataframe called base_varlist2 that has some date columns, all as a string in varying formats. Here's a quick snapshot of the first column:

In [27]: print(base_varlist2.iloc[0])
completion_date_latest               07/10/2004
completion_date_original             17/05/1996
customer_birth_date_1                04/02/1963
customer_birth_date_2                       NaN
d_start                               01Feb2018
latest_maturity_date                 01/03/2027
latest_valuation_date                08/05/2004
sdate                                       NaN
startdt_def                                 NaN

As you can see, some of these columns are empty, some aren't.

I need a way of dynamically changing these to datetime values, regardless of their existing string format. Is there a way to do this from the dataframe, without specifying a specific format for each column? For example, I won't always know if d_start will be in %d%b%Y format.

The dataframe was made from the original CSV which is much larger, and contains lots of other columns (that aren't dates), like text fields and numeric fields. I'm using a read_csv method to bring that in, as below:

source_data = pd.read_csv(loc + 'TEST_FILE.csv')

Guybrush · Accepted Answer · 2018-04-10 12:16:02Z

4

Pandas'read_csv function supports parse_dates parameter, which can be a list of column names for which you want a conversion from str to datetime.

Alternatively, you can always convert a series using pandas.to_datetime, e.g.:

df['date'] = pandas.to_datetime(df['date'])

Notice that pandas makes no assumption on the format you used for dates.

answered Apr 10, 2018 at 12:16

Guybrush

2,8421 gold badge13 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Aaraeus Over a year ago

So I tried this by looping through my columns and creating a new dataframe, with all the columns converted to datetime, but I'm getting an error when I try to view the resultant dataset. When I submit print(mm_dates_base.iloc[0]), I get ValueError: cannot convert float NaN to integer.

Guybrush Over a year ago

That's weird, and probably not directly related to the question though.

Aaraeus Over a year ago

You're right. I'll mark this down as complete but I've raised another question about this problem here: stackoverflow.com/questions/49756649/…

jezrael · Accepted Answer · 2018-04-10 12:23:18Z

1

Unfortunately this does not working on fly, need specify each column for parse to dates, docs:

source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=['date1','date2','date6'])
#or specify columns by positions
source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=[0,1,6])

In my opinion reason is performance and also avoid converting numeric values like 2000 to datetimes.

edited Apr 10, 2018 at 12:23

answered Apr 10, 2018 at 12:17

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Aaraeus Over a year ago

How can I do this in a case insensitive way? I have my list of date columns, but their names are formatted slightly differently in the CSV file...

jezrael Over a year ago

@Aaraeus - I think not, then try second solution for seelct by positions.

Collectives™ on Stack Overflow

Parsing dates to datetime on the fly in python using pandas

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related