3

I have an ascii file where the dates are formatted as follows:

Jan 20 2015 00:00:00.000
Jan 20 2015 00:10:00.000
Jan 20 2015 00:20:00.000
Jan 20 2015 00:30:00.000
Jan 20 2015 00:40:00.000

When loading the file into pandas, each column above gets its own column in a pandas dataframe. I've tried the variations of the following:

from pandas import read_csv
from datetime import datetime

df = read_csv('file.txt', header=None, delim_whitespace=True,
              parse_dates={'datetime': [0, 1, 2, 3]},
              date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H %M %S'))

I get a couple errors:

TypeError: <lambda>() takes 1 positional argument but 4 were given
ValueError: time data 'Jun 29 2017 00:35:00.000' does not match format '%b %d %Y %H %M %S'

I'm confused because:

  1. I'm passing a dict to parse_dates to parse the different columns as a single date.
  2. I'm using: %b - abbreviated month name, %d - day of the month, %Y year with century, %H 24-hour, %M - minute, and %S - second

Anyone see what I'm doing incorrectly?

Edit:

I've tried date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S') which returns ValueError: unconverted data remains: .000

Edit 2:

I tried what @MaxU suggested in his update, but it was problematic because my original data is formatted like the following:

Jan   1  2017  00:00:00.000   123 456 789 111 222 333 

I'm only interested in the first 7 columns so I import my file with the following:

df = read_csv(fn, header=None, delim_whitespace=True, usecols=[0, 1, 2, 3, 4, 5, 6])

Then to create a column with datetime information from the first 4 columns I try:

df['datetime'] = to_datetime(df.ix[:, :3], format='%b %d %Y %H:%M:%S.%f')

However this doesn't work because to_datetime expects "integer, float, string, datetime, list, tuple, 1-d array, Series" as the first argument and df.ix[:, :3] returns a dataframe with the following format:

         0   1     2             3
0      Jan   1  2017  00:00:00.000

How do I feed in every row of the first four columns to to_datetime such that I get one column of datetimes?

Edit 3:

I think I solved my second problem. I just use to following command and do everything when I read my file in (I was basically just missing %f to parse past seconds):

df = read_csv(fileName, header=None, delim_whitespace=True,
              parse_dates={'datetime': [0, 1, 2, 3]},
              date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S.%f'),
              usecols=[0, 1, 2, 3, 4, 5, 6])

The whole reason I wanted to parse manually instead of letting pandas handle it like @MaxU suggested was to see if manually feeding in instructions would be faster - and it is! From my tests the snippet above runs approximately 5-6 times faster than letting pandas infer parsing for you.

2 Answers 2

5

Have a go to this simpler approach:

df = pandas.read_csv('file.txt')
df.columns = ['date']

df should be a dataframe with a single column. After that try casting that column to datetime

df['date'] = pd.to_datetime(df['date'])
Sign up to request clarification or add additional context in comments.

Comments

2

Pandas (tested with version 0.20.1) is smart enough to do it for you:

In [4]: pd.read_csv(fn, sep='\s+', parse_dates={'datetime': [0, 1, 2, 3]})
Out[4]:
             datetime
0 2015-01-20 00:10:00
1 2015-01-20 00:20:00
2 2015-01-20 00:30:00
3 2015-01-20 00:40:00

UPDATE: if all entries have the same format you can try to do it this way:

df = pd.read_csv(fn, sep='~', names=['datetime'])
df['datetime'] = pd.to_datetime(df['datetime'], format='%b %d %Y %H:%M:%S.%f')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.