2

I've got a dataframe called base_varlist2 that has some date columns, all as a string in varying formats. Here's a quick snapshot of the first column:

In [27]: print(base_varlist2.iloc[0])
completion_date_latest               07/10/2004
completion_date_original             17/05/1996
customer_birth_date_1                04/02/1963
customer_birth_date_2                       NaN
d_start                               01Feb2018
latest_maturity_date                 01/03/2027
latest_valuation_date                08/05/2004
sdate                                       NaN
startdt_def                                 NaN

As you can see, some of these columns are empty, some aren't.

I need a way of dynamically changing these to datetime values, regardless of their existing string format. Is there a way to do this from the dataframe, without specifying a specific format for each column? For example, I won't always know if d_start will be in %d%b%Y format.

The dataframe was made from the original CSV which is much larger, and contains lots of other columns (that aren't dates), like text fields and numeric fields. I'm using a read_csv method to bring that in, as below:

source_data = pd.read_csv(loc + 'TEST_FILE.csv')

0

2 Answers 2

4

Pandas'read_csv function supports parse_dates parameter, which can be a list of column names for which you want a conversion from str to datetime.

Alternatively, you can always convert a series using pandas.to_datetime, e.g.:

df['date'] = pandas.to_datetime(df['date'])

Notice that pandas makes no assumption on the format you used for dates.

Sign up to request clarification or add additional context in comments.

3 Comments

So I tried this by looping through my columns and creating a new dataframe, with all the columns converted to datetime, but I'm getting an error when I try to view the resultant dataset. When I submit print(mm_dates_base.iloc[0]), I get ValueError: cannot convert float NaN to integer.
That's weird, and probably not directly related to the question though.
You're right. I'll mark this down as complete but I've raised another question about this problem here: stackoverflow.com/questions/49756649/…
1

Unfortunately this does not working on fly, need specify each column for parse to dates, docs:

source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=['date1','date2','date6'])
#or specify columns by positions
source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=[0,1,6])

In my opinion reason is performance and also avoid converting numeric values like 2000 to datetimes.

2 Comments

How can I do this in a case insensitive way? I have my list of date columns, but their names are formatted slightly differently in the CSV file...
@Aaraeus - I think not, then try second solution for seelct by positions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.