42

I have a 719mb CSV file that looks like:

from, to, dep, freq, arr, code, mode   (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99999,2
RGBOXFD,RGBRDLEY,127,0,1425,99999,2
RGBOXFD,RGBCHOLSEY,127,0,52,99999,2
RGBOXFD,RGBMDNHEAD,127,0,91,99999,2
RGBDIDCOTP,RGBPADTON,127,0,46,99999,2
RGBDIDCOTP,RGBPADTON,127,0,3,99999,2
RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2
RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2
RGBDIDCOTP,RGBPADTON,127,0,115,99999,2
and so on... 

I want to load in to a pandas DataFrame. Now I know there is a load from csv method:

 r = pd.DataFrame.from_csv('test_data2.csv')

But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:

So ending up with:

                   dep, freq, arr, code, mode
RGBOXFD RGBPADTON  127     0   27  99999    2
        RGBRDLEY   127     0   33  99999    2
        RGBCHOLSEY 127     0 1425  99999    2
        RGBMDNHEAD 127     0 1525  99999    2

etc. I'm not sure how to do that?

2 Answers 2

66

You could use pd.read_csv:

>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
                       dep  freq   arr   code  mode
from       to                                      
RGBOXFD    RGBPADTON   127     0    27  99999     2
           RGBPADTON   127     0    33  99999     2
           RGBRDLEY    127     0  1425  99999     2
           RGBCHOLSEY  127     0    52  99999     2
           RGBMDNHEAD  127     0    91  99999     2
RGBDIDCOTP RGBPADTON   127     0    46  99999     2
           RGBPADTON   127     0     3  99999     2
           RGBCHOLSEY  127     0    61  99999     2
           RGBRDLEY    127     0  1430  99999     2
           RGBPADTON   127     0   115  99999     2

where I've used skipinitialspace=True to get rid of those annoying spaces in the header row.

Sign up to request clarification or add additional context in comments.

2 Comments

And if you're looking for column-wise multi-index, instead call: df = pd.read_csv("data.csv", header=[0,1], index_col=0))
Thanks to @safay - but a word of caution for anyone like me confused when working in PyCharm as the in-build dataframe viewer doesn't display correctly but has the multi-index header with a slash (row0/row1) notation.
3

from_csv() works similarly:

import pandas as pd

df = pd.DataFrame.from_csv(
    'data.txt',
    index_col = [0, 1]
)

print df

--output:--
                        dep   freq   arr   code   mode
from        to                                        
RGBOXFD    RGBPADTON    127      0    27  99999      2
           RGBPADTON    127      0    33  99999      2
           RGBRDLEY     127      0  1425  99999      2
           RGBCHOLSEY   127      0    52  99999      2
           RGBMDNHEAD   127      0    91  99999      2
RGBDIDCOTP RGBPADTON    127      0    46  99999      2
           RGBPADTON    127      0     3  99999      2
           RGBCHOLSEY   127      0    61  99999      2
           RGBRDLEY     127      0  1430  99999      2
           RGBPADTON    127      0   115  99999      2

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html#pandas.DataFrame.from_csv

From this discussion,

https://github.com/pydata/pandas/issues/4916

it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.

4 Comments

FYI: from_csv is deprecated (it does the same thing as read_csv, but with some surprising settings, and sometimes behaves strangely with date parsing).
from_csv is deprecated I'm not seeing that.
You're absolutely right, pandas hasn't deprecated it yet. Sorry! That said, we've been thinking about deprecating. It's also just a little weird, like how it defaults to parses ints as dates, and we've been discussing deprecating it I guess).
Np. I read the discussion at your link as well as the discussion it links to in the last post, which is the link I included in my post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.