Parsing a datetime string when reading in data with pandas

Question

I am struggling with reading in a datetime string and storing it as a variable. I have a block of data that looks like this:

2011-11-01 05:20:00 00:10:00
#    z  speed    dir      W   sigW       bck   error 
30   4.76  238.9   0.01   0.13  7.56E+06       0
40   5.24  237.1  -0.05   0.12  5.99E+06       0
50   6.33  236.6  -0.01   0.12  7.24E+06       0
60   7.06  237.3  -0.01   0.12  9.15E+06       0
70   7.85  238.2  -0.02   0.13  8.47E+06       0
80   8.85  237.3  -0.03   0.14  1.05E+07     256

2011-11-01 05:30:00 00:10:00
#    z  speed    dir      W   sigW       bck   error 
30   4.40  234.8   0.08   0.12  1.33E+07       0
40   5.07  234.2   0.11   0.12  5.82E+06       0
50   5.75  234.3   0.12   0.12  6.61E+06       0
60   6.56  232.4   0.08   0.13  6.39E+06       0
70   7.22  233.2   0.10   0.13  5.64E+06       0
80   8.15  235.3   0.12   0.14  5.87E+06     256

My code works great for what I need to do except for reading in the datetime string because I keep getting an error. Here is my code: import pandas as pd import glob import datetime

def parse_date(string):
    # Split the string into year/month/date, time, and seconds
    split_string = string.split()
    # get year month and date
    year = split_string[0].split('-')[0]
    month = split_string[0].split('-')[1]
    date = split_string[0].split('-')[2]

    # get hour minute second
    hour = split_string[1].split(':')[0]
    mm = split_string[1].split(':')[1]
    second = split_string[1].split(':')[2]

    return datetime.datetime(int(year), int(month), int(date), hour=int(hour), minute=int(mm), second=int(second))

filename = glob.glob('1511??.mnd')
data_nov15_hereford = pd.DataFrame()
frames = []
dates = []
counter = 1
for i in filename:
    f_nov15_hereford = pd.read_csv(i, skiprows = 32, sep='\s+')
    for line in f_nov15_hereford:
        if line.startswith('20'):
            print line
            dates.append(parse_date(line))
            counter = 0
        else:
            counter += 1 
    frames.append(f_nov15_hereford) 
data_nov15_hereford = pd.concat(frames,ignore_index=True)
data_nov15_hereford = data_nov15_hereford.convert_objects(convert_numeric=True)

My error is with my parsing function:

     15     # get hour minute second
---> 16     hour = split_string[1].split(':')[0]
     17     mm = split_string[1].split(':')[1]
     18     second = split_string[1].split(':')[2]

IndexError: list index out of range

If anyone could help me figure out this error that would be great. Thanks!

chishaku · Accepted Answer · 2016-02-19 20:56:34Z

2

Don't reinvent the wheel by making your own date parsing function. Utilize the datetime.datetime.strptime function from the standard lib.

Pass the date string and the format of the string to the strptime function.

import datetime
date_string = '2011-11-01 05:20:00'
date_object = datetime.datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')

It looks like you're dealing with a string that has a date and time along with an interval? You can parse the date, time and interval separately:

original_string = '2011-11-01 05:20:00 00:10:00'
date_string, time_string, interval_string = original_string.split()
date_object = datetime.datetime.strptime(date_string, '%Y-%m-%d')
time_object = datetime.datetime.strptime(time_string, ' %H:%M:%S')
interval_object = datetime.datetime.strptime(interval_string, '%H:%M:%S')

I would review the docs for parsing and formatting dates:

edited Feb 19, 2016 at 20:56

answered Feb 19, 2016 at 20:43

chishaku

4,6633 gold badges27 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

HM14 Over a year ago

It's weird because when I put this first suggestion into my code I get an error: ValueError: time data '2015-11-01' does not match format '%Y-%m-%d %H:%M:%S'

chishaku Over a year ago

If your string is simply '2015-11-01', then the format is '%Y-%m-%d '.

chishaku Over a year ago

%Y represents the four digit year, %m represents the two-digit, zero-padded month, %d represents the two-digit, zero-padded day.

HM14 Over a year ago

But if you look at my data there is more to the string with h, m, s but for some reason my code isn't grabbing the entire line. Maybe that is because of my startswith part of my loop? Hmm so this should work I think my problem now is with how I am grabbing my string

chishaku Over a year ago

Depending on how pandas is reading the csv, there might be string literals trailing the date string such as a carriage return which is represented like so \r. Instead of print line, try print repr(line) and it might show you something like '2011-11-01 05:20:00\r'. If it does, you will have to replace that \r.

|

Marcelo Bielsa · Accepted Answer · 2016-02-19 20:52:03Z

1

You could simply get the datetime string

thestring = "2011-11-01 05:20:00 00:10:00"`

then convert to time

aa = thestring.split(" ")
t =datetime.datetime.strptime(aa[0]+" "+aa[1], "%Y-%m-%d %H:%M:%S")

and finally access the hour, minutes, etc. E.g.,

t.hour

answered Feb 19, 2016 at 20:52

Marcelo Bielsa

14211 bronze badges

Collectives™ on Stack Overflow

Parsing a datetime string when reading in data with pandas

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related