32

I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?

For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015

Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.

So each "Dollar Sales" heading is actually tied to the date in the row above.

Subsequent rows are individual items with data.

Is there a way to do a groupby or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.

2
  • 1
    You can use a pandas.MultiIndex as header. See for example this and this. Commented Dec 6, 2016 at 21:57
  • This works, thank you. I was unaware that MultiIndex would also apply to column headers... still learning pandas. Commented Dec 8, 2016 at 21:35

2 Answers 2

43

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns:

df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])

You can also use more than two rows, or non-consecutive rows, to specify the column headers:

df = pandas.read_table('DataSheet1.csv', header=[0,2,3])
Sign up to request clarification or add additional context in comments.

2 Comments

How will it look like? Can you add an example?
Awesome solution squareskittles!! This is exactly what I was looking for...for the past several days!! Thanks so much!!
1

A MultiIndex can be created from rows and assigned as the new column labels.

For example, to make the following transformation, use pd.MultiIndex.from_frame().

result

df = pd.DataFrame([[2016, 2016, 2015, 2015], 
                   ['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[*'ABCD'])

new_labels = pd.MultiIndex.from_frame(df.iloc[:2].T.astype(str), names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[2:]

A MultiIndex can also be created from the old column labels and a dataframe row. For example, to make the following transformation, use pd.MultiIndex.from_arrays().

result2

df = pd.DataFrame([['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[2016, 2016, 2015, 2015])

new_labels = pd.MultiIndex.from_arrays([df.columns, df.iloc[0]], names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[1:]

N.B. An important thing to note is the dtypes of the dataframe could be not ideal for the data it holds; astype(int) etc. could be necessary in the end.

Also, reset_index(drop=True) may be needed if the index should be reset.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.