1

I am trying to read a csv file which includes dates. The csv looks like this:

h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420

For reading the csv I use this code:

df = pd.read_csv(filen,
    index_col=None,
    header=0,
    parse_dates=[5],
    date_parser=lambda t:parse(t))

The parse function looks like this:

def parse(t):
    string_ = str(t)
    try:
        return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
    except:
        return datetime.date(1900,1,1)

My strange problem now is that in the parsing function, t looks like this:

ndarray: ['20150420' '20150420' '20150420']

As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.

What am I doing wrong?

EDIT:

okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.

EDIT2:

My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:

def parse(t):
    ret = []
    for ts in t:
        string_ = str(ts)
        try:
            tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
        except:
            tsdt = datetime.date(1900,1,1)
        ret.append(tsdt)
    return ret

2 Answers 2

2

There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:

df = pd.read_csv("tmp.csv",  header=None, skiprows=1, parse_dates=[5])
Sign up to request clarification or add additional context in comments.

Comments

2

you can try this parser :

parser = lambda x: pd.to_datetime(x, format='%Y%m%d', coerce=True)

and use

df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=parser)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.