3

I came across a problem today that I unable to solve. I read a csv file using

mydata = pd.read_csv(file_name, header=0, sep=",", index_col=[0], parse_dates=True)

the CSV looks like:

2009-12-10,5,6,7,8,9  
2009-12-11,7,6,6,7,9

instead of getting an indexed dataframe i get the following output

print mydata

Empty DataFrame
Columns: []
Index: [2009-12-10,5,6,7,8,9 2009-12-11,7,6,6,7,9]

Please help!! I have been trying for 2 hours now!

Many thanks

2
  • 2
    Are you sure that's the csv? It looks like it has . instead of newline, perhaps try lineterminator='.' Commented Jan 11, 2014 at 19:18
  • 2
    can your provide output of repr(open(file_name).read()[:50])? Commented Jan 11, 2014 at 19:34

3 Answers 3

3

I think your code works. Here's what I see:

The data:

import pandas as pd

data = """2009-12-10,5,6,7,8,9
2009-12-11,7,6,6,7,9"""

Read the data from the csv.

ts = pd.read_csv(pd.io.parsers.StringIO(data),
    names=['timepoint', 'a','b','c','d','e'],
    parse_dates=True,
    index_col=0)

That looks like this

In [59]: ts
Out[59]:
            a  b  c  d  e
timepoint
2009-12-10  5  6  7  8  9
2009-12-11  7  6  6  7  9

And the index is a time series

In [60]: ts.index
Out[60]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-12-10 00:00:00, 2009-12-11 00:00:00]
Length: 2, Freq: None, Timezone: None

Can you give this a try and post an update if you get different results?

UPDATE: In response to @prre72's comment regarding column headers in the csv file:

If the csv has 5 column headers with the index column being unlabeled, you can do this:

In [17]: 
data = """"a","b","c","d","e"
2009-12-10,5,6,7,8,9
2009-12-11,7,6,6,7,9"""

ts = pd.read_csv(pd.io.parsers.StringIO(data),
    parse_dates=True,
    index_col=0)

In [18]: ts
Out[18]:
            a  b  c  d  e
2009-12-10  5  6  7  8  9
2009-12-11  7  6  6  7  9

In [19]: ts.index
Out[19]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-12-10 00:00:00, 2009-12-11 00:00:00]
Length: 2, Freq: None, Timezone: None
Sign up to request clarification or add additional context in comments.

2 Comments

I have noticed a difference: your data has no headers, while my csv has headers in it with quoted names "". Should i remove headers when reading and then add them back?
Interesting. Does the 1st line of your csv have 5 or 6 headers in it? What I mean is, does the index column have a column header? Or does the file have only 5 column headers with the index column unlabeled?
1
import pandas as pd
raw_dt = pd.read_csv("fileName.csv", import_dates = True, index_col = 0)
raw_dt

Now, when you execute this code, index_col = 0 will treat the first column from your file as the index column and import_dates = True will parse columns containing dates in your file to date type.

Comments

0

You need to use parse_dates=[0] to specify the date columns you want to parse. You don't have to sepcify header=0. Use header=None instead, which won't force you specifying headers. Try this:

mydata = pd.read_csv(file_name, header=None, sep=",", index_col=[0], 
    parse_dates=[0])
print mydata
            1  2  3  4  5
0                        
2009-12-10  5  6  7  8  9
2009-12-11  7  6  6  7  9

If you want to specify column names, just use this:

mydata.columns = list("abcde")  # list of column names

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.