1

I am trying to read similar to below CSV file to pandas.DataFrame:

2011    1   10  AAPL    Buy     1500
2011    1   13  AAPL    Sell    1500
2011    1   13  IBM     Buy     4000
2011    1   26  GOOG    Buy     1000

Data doesn't have column headers. When I read that file I also want to parse first 3 columns to a 'date' column. So the following is what I tried:

import pandas
pandas.read_csv(fileName,
                header = None,
                names = ('Date', 'Symbol', 'Side', 'Quantity'),
                parse_dates = {'Date' : [0, 1, 2]})

That raises:

NotImplementedError: file structure not yet supported

I tried:

pandas.read_csv(fileName,
                header = None,
                names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
                parse_dates = {'Date' : ['Year', 'Month', 'Day']})

and neither did that work and threw the same exception.

So finally I accomplished reading that file by:

orders = pandas.read_csv(fileName,
                         header = None,
                         parse_dates = {'Date' : [0, 1, 2]})
orders.rename(columns = {3: 'Symbol', 4 : 'Side', 5: 'Quantity'})

Is there a way to make the first call to work by passing column names to names? Why that exception is raised? Similar problem was reported in Pandas file structure not supported error but I couldn't see any solution other than the same workaround.

I am using pandas 0.18.1 which is the latest version to my knowledge.

1 Answer 1

1

try to add:

sep='\s+'

or

delim_whitespace=True

parameter

Demo:

In [7]: %paste
(pd.read_csv(fileName, sep='\s+', header = None,
             names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
             parse_dates = {'Date' : ['Year', 'Month', 'Day']})
)
## -- End pasted text --
Out[7]:
        Date Symbol  Side  Quantity
0 2011-01-10   AAPL   Buy      1500
1 2011-01-13   AAPL  Sell      1500
2 2011-01-13    IBM   Buy      4000
3 2011-01-26   GOOG   Buy      1000

In [8]: %paste
(pd.read_csv(fileName, delim_whitespace=True, header = None,
             names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
             parse_dates = {'Date' : ['Year', 'Month', 'Day']})
)
## -- End pasted text --
Out[8]:
        Date Symbol  Side  Quantity
0 2011-01-10   AAPL   Buy      1500
1 2011-01-13   AAPL  Sell      1500
2 2011-01-13    IBM   Buy      4000
3 2011-01-26   GOOG   Buy      1000
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @MaxU. The actual structure of the file is: 2011,1,10,AAPL,Buy,1500, 2011,1,13,AAPL,Sell,1500, 2011,1,13,IBM,Buy,4000, 2011,1,26,GOOG,Buy,1000, So eventually I added dummy column names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity', 'bla'), because of the comma and it worked. The exception can be a bit clearer though..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.