2

I have following input trans.csv file:

Date,Currenncy,Symbol,Type,Units,UnitPrice,Cost,Tax
2012-03-14,USD,AAPL,BUY,1000
2012-05-12,USD,SBUX,SELL,500

The fields UnitPrice, Cost and Tax are optional. If they are not specified I expect NaN in the DataFrame cell.

I read the csv file with:

t = pandas.read_csv('trans.csv', parse_dates=True, index_col=0)

and got the following result:

           Currenncy Symbol  Type  Units   UnitPrice       Cost       Tax
Date                                                                     
2012-03-14       USD   AAPL   BUY   1000  2012-05-12  012-05-12  12-05-12
2012-02-05       USD   SBUX  SELL    500         NaN        NaN       NaN

Why are there no NaN in the first row and is the Date repeated? Any workaround to get NaN for the unspecified fields?

1
  • Added this as an issue on github. The answer I posted should fix it for now (it catches when there is data in some of the columns)... Commented Jan 9, 2013 at 15:26

2 Answers 2

3

Your CSV file is malformed. I get the same answer as you in Pandas 0.10, and while I admit that it is indeed very, very strange, you shouldn't be feeding it malformed data.

Date,Currenncy,Symbol,Type,Units,UnitPrice,Cost,Tax
2012-03-14,USD,AAPL,BUY,1000,,,
2012-05-12,USD,SBUX,SELL,500,,,

returns the expected

>>> import pandas as pd
>>> t = pd.read_csv('pandas_test', parse_dates=True, index_col=0)
>>> t
           Currenncy Symbol  Type  Units  UnitPrice  Cost  Tax
Date                                                          
2012-03-14       USD   AAPL   BUY   1000        NaN   NaN  NaN
2012-05-12       USD   SBUX  SELL    500        NaN   NaN  NaN
Sign up to request clarification or add additional context in comments.

7 Comments

That's one comma too much, now there are NaN Units, without an error message! Can the extra commas not be made optional by pandas. Looks cleaner.
"Looks cleaner". Why do you care what the original data looks like if you're parsing it? It doesn't matter that it looks cleaner if it's incorrect.
@hayden, indeed feeding malformed data is life. Can't we expect from pandas that this is handled gracefully ? It's not that malformed.
@rdw I think so. It looks like a bug.
"that malformed". Being malformed is binary, it's malformed or it isn't malformed.
|
2

Here's a method which can handle some more cases (when there is some data in UnitCost, Cost, etc.).

In [1]: df = pd.read_csv('trans.csv', header=None)

In [2]: df.columns = df.ix[0]

In [3]: df[1:].set_index('Date')
Out[3]: 
           Currenncy Symbol  Type Units UnitPrice Cost  Tax
Date                                                       
2012-03-14       USD   AAPL   BUY  1000       NaN  NaN  NaN
2012-05-12       USD   SBUX  SELL   500       NaN  NaN  NaN
2012-05-12       USD   SBUX  SELL   500       NaN  NaN  NaN

It's worth noting that the dtype of the these columns will be object.

However, I think this should be caught by to_csv so I posted as an issue on github.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.