pandas parse dates from csv

Question

I am trying to read a csv file which includes dates. The csv looks like this:

h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420

For reading the csv I use this code:

df = pd.read_csv(filen,
    index_col=None,
    header=0,
    parse_dates=[5],
    date_parser=lambda t:parse(t))

The parse function looks like this:

def parse(t):
    string_ = str(t)
    try:
        return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
    except:
        return datetime.date(1900,1,1)

My strange problem now is that in the parsing function, t looks like this:

ndarray: ['20150420' '20150420' '20150420']

As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.

What am I doing wrong?

EDIT:

okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.

EDIT2:

My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:

def parse(t):
    ret = []
    for ts in t:
        string_ = str(ts)
        try:
            tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
        except:
            tsdt = datetime.date(1900,1,1)
        ret.append(tsdt)
    return ret

gnebehay · Accepted Answer · 2016-09-29 10:32:45Z

2

There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:

df = pd.read_csv("tmp.csv",  header=None, skiprows=1, parse_dates=[5])

edited Sep 29, 2016 at 10:32

gnebehay

1,5331 gold badge10 silver badges12 bronze badges

answered Jun 16, 2015 at 10:59

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

steboc · Accepted Answer · 2015-06-16 10:37:51Z

2

you can try this parser :

parser = lambda x: pd.to_datetime(x, format='%Y%m%d', coerce=True)

and use

df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=parser)

answered Jun 16, 2015 at 10:37

steboc

1,1811 gold badge7 silver badges18 bronze badges

Collectives™ on Stack Overflow

pandas parse dates from csv

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related