0

I am struggling with reading in a datetime string and storing it as a variable. I have a block of data that looks like this:

2011-11-01 05:20:00 00:10:00
#    z  speed    dir      W   sigW       bck   error 
30   4.76  238.9   0.01   0.13  7.56E+06       0
40   5.24  237.1  -0.05   0.12  5.99E+06       0
50   6.33  236.6  -0.01   0.12  7.24E+06       0
60   7.06  237.3  -0.01   0.12  9.15E+06       0
70   7.85  238.2  -0.02   0.13  8.47E+06       0
80   8.85  237.3  -0.03   0.14  1.05E+07     256

2011-11-01 05:30:00 00:10:00
#    z  speed    dir      W   sigW       bck   error 
30   4.40  234.8   0.08   0.12  1.33E+07       0
40   5.07  234.2   0.11   0.12  5.82E+06       0
50   5.75  234.3   0.12   0.12  6.61E+06       0
60   6.56  232.4   0.08   0.13  6.39E+06       0
70   7.22  233.2   0.10   0.13  5.64E+06       0
80   8.15  235.3   0.12   0.14  5.87E+06     256

My code works great for what I need to do except for reading in the datetime string because I keep getting an error. Here is my code: import pandas as pd import glob import datetime

def parse_date(string):
    # Split the string into year/month/date, time, and seconds
    split_string = string.split()
    # get year month and date
    year = split_string[0].split('-')[0]
    month = split_string[0].split('-')[1]
    date = split_string[0].split('-')[2]

    # get hour minute second
    hour = split_string[1].split(':')[0]
    mm = split_string[1].split(':')[1]
    second = split_string[1].split(':')[2]

    return datetime.datetime(int(year), int(month), int(date), hour=int(hour), minute=int(mm), second=int(second))

filename = glob.glob('1511??.mnd')
data_nov15_hereford = pd.DataFrame()
frames = []
dates = []
counter = 1
for i in filename:
    f_nov15_hereford = pd.read_csv(i, skiprows = 32, sep='\s+')
    for line in f_nov15_hereford:
        if line.startswith('20'):
            print line
            dates.append(parse_date(line))
            counter = 0
        else:
            counter += 1 
    frames.append(f_nov15_hereford) 
data_nov15_hereford = pd.concat(frames,ignore_index=True)
data_nov15_hereford = data_nov15_hereford.convert_objects(convert_numeric=True)

My error is with my parsing function:

     15     # get hour minute second
---> 16     hour = split_string[1].split(':')[0]
     17     mm = split_string[1].split(':')[1]
     18     second = split_string[1].split(':')[2]

IndexError: list index out of range

If anyone could help me figure out this error that would be great. Thanks!

2 Answers 2

2

Don't reinvent the wheel by making your own date parsing function. Utilize the datetime.datetime.strptime function from the standard lib.

Pass the date string and the format of the string to the strptime function.

import datetime
date_string = '2011-11-01 05:20:00'
date_object = datetime.datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')

It looks like you're dealing with a string that has a date and time along with an interval? You can parse the date, time and interval separately:

original_string = '2011-11-01 05:20:00 00:10:00'
date_string, time_string, interval_string = original_string.split()
date_object = datetime.datetime.strptime(date_string, '%Y-%m-%d')
time_object = datetime.datetime.strptime(time_string, ' %H:%M:%S')
interval_object = datetime.datetime.strptime(interval_string, '%H:%M:%S')

I would review the docs for parsing and formatting dates:

Sign up to request clarification or add additional context in comments.

8 Comments

It's weird because when I put this first suggestion into my code I get an error: ValueError: time data '2015-11-01' does not match format '%Y-%m-%d %H:%M:%S'
If your string is simply '2015-11-01', then the format is '%Y-%m-%d '.
%Y represents the four digit year, %m represents the two-digit, zero-padded month, %d represents the two-digit, zero-padded day.
But if you look at my data there is more to the string with h, m, s but for some reason my code isn't grabbing the entire line. Maybe that is because of my startswith part of my loop? Hmm so this should work I think my problem now is with how I am grabbing my string
Depending on how pandas is reading the csv, there might be string literals trailing the date string such as a carriage return which is represented like so \r. Instead of print line, try print repr(line) and it might show you something like '2011-11-01 05:20:00\r'. If it does, you will have to replace that \r.
|
1

You could simply get the datetime string

thestring = "2011-11-01 05:20:00 00:10:00"`

then convert to time

aa = thestring.split(" ")
t =datetime.datetime.strptime(aa[0]+" "+aa[1], "%Y-%m-%d %H:%M:%S")

and finally access the hour, minutes, etc. E.g.,

t.hour

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.