Reading no-header CSV to pandas dataframe and parsing date

Question

I am trying to read similar to below CSV file to pandas.DataFrame:

2011    1   10  AAPL    Buy     1500
2011    1   13  AAPL    Sell    1500
2011    1   13  IBM     Buy     4000
2011    1   26  GOOG    Buy     1000

Data doesn't have column headers. When I read that file I also want to parse first 3 columns to a 'date' column. So the following is what I tried:

import pandas
pandas.read_csv(fileName,
                header = None,
                names = ('Date', 'Symbol', 'Side', 'Quantity'),
                parse_dates = {'Date' : [0, 1, 2]})

That raises:

NotImplementedError: file structure not yet supported

I tried:

pandas.read_csv(fileName,
                header = None,
                names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
                parse_dates = {'Date' : ['Year', 'Month', 'Day']})

and neither did that work and threw the same exception.

So finally I accomplished reading that file by:

orders = pandas.read_csv(fileName,
                         header = None,
                         parse_dates = {'Date' : [0, 1, 2]})
orders.rename(columns = {3: 'Symbol', 4 : 'Side', 5: 'Quantity'})

Is there a way to make the first call to work by passing column names to names? Why that exception is raised? Similar problem was reported in Pandas file structure not supported error but I couldn't see any solution other than the same workaround.

I am using pandas 0.18.1 which is the latest version to my knowledge.

MaxU - stand with Ukraine · Accepted Answer · 2016-06-28 23:21:28Z

1

try to add:

sep='\s+'

or

delim_whitespace=True

parameter

Demo:

In [7]: %paste
(pd.read_csv(fileName, sep='\s+', header = None,
             names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
             parse_dates = {'Date' : ['Year', 'Month', 'Day']})
)
## -- End pasted text --
Out[7]:
        Date Symbol  Side  Quantity
0 2011-01-10   AAPL   Buy      1500
1 2011-01-13   AAPL  Sell      1500
2 2011-01-13    IBM   Buy      4000
3 2011-01-26   GOOG   Buy      1000

In [8]: %paste
(pd.read_csv(fileName, delim_whitespace=True, header = None,
             names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity'),
             parse_dates = {'Date' : ['Year', 'Month', 'Day']})
)
## -- End pasted text --
Out[8]:
        Date Symbol  Side  Quantity
0 2011-01-10   AAPL   Buy      1500
1 2011-01-13   AAPL  Sell      1500
2 2011-01-13    IBM   Buy      4000
3 2011-01-26   GOOG   Buy      1000

answered Jun 28, 2016 at 23:21

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rustam Aliyev Over a year ago

Thanks @MaxU. The actual structure of the file is: 2011,1,10,AAPL,Buy,1500, 2011,1,13,AAPL,Sell,1500, 2011,1,13,IBM,Buy,4000, 2011,1,26,GOOG,Buy,1000, So eventually I added dummy column names = ('Year', 'Month', 'Day', 'Symbol', 'Side', 'Quantity', 'bla'), because of the comma and it worked. The exception can be a bit clearer though..

Collectives™ on Stack Overflow

Reading no-header CSV to pandas dataframe and parsing date

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related