I have an ascii file where the dates are formatted as follows:
Jan 20 2015 00:00:00.000
Jan 20 2015 00:10:00.000
Jan 20 2015 00:20:00.000
Jan 20 2015 00:30:00.000
Jan 20 2015 00:40:00.000
When loading the file into pandas, each column above gets its own column in a pandas dataframe. I've tried the variations of the following:
from pandas import read_csv
from datetime import datetime
df = read_csv('file.txt', header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H %M %S'))
I get a couple errors:
TypeError: <lambda>() takes 1 positional argument but 4 were given
ValueError: time data 'Jun 29 2017 00:35:00.000' does not match format '%b %d %Y %H %M %S'
I'm confused because:
- I'm passing a dict to
parse_datesto parse the different columns as a single date. - I'm using:
%b- abbreviated month name,%d- day of the month,%Yyear with century,%H24-hour,%M- minute, and%S- second
Anyone see what I'm doing incorrectly?
Edit:
I've tried date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S') which returns ValueError: unconverted data remains: .000
Edit 2:
I tried what @MaxU suggested in his update, but it was problematic because my original data is formatted like the following:
Jan 1 2017 00:00:00.000 123 456 789 111 222 333
I'm only interested in the first 7 columns so I import my file with the following:
df = read_csv(fn, header=None, delim_whitespace=True, usecols=[0, 1, 2, 3, 4, 5, 6])
Then to create a column with datetime information from the first 4 columns I try:
df['datetime'] = to_datetime(df.ix[:, :3], format='%b %d %Y %H:%M:%S.%f')
However this doesn't work because to_datetime expects "integer, float, string, datetime, list, tuple, 1-d array, Series" as the first argument and df.ix[:, :3] returns a dataframe with the following format:
0 1 2 3
0 Jan 1 2017 00:00:00.000
How do I feed in every row of the first four columns to to_datetime such that I get one column of datetimes?
Edit 3:
I think I solved my second problem.
I just use to following command and do everything when I read my file in (I was basically just missing %f to parse past seconds):
df = read_csv(fileName, header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S.%f'),
usecols=[0, 1, 2, 3, 4, 5, 6])
The whole reason I wanted to parse manually instead of letting pandas handle it like @MaxU suggested was to see if manually feeding in instructions would be faster - and it is! From my tests the snippet above runs approximately 5-6 times faster than letting pandas infer parsing for you.