1

I have a dataframe with a large number of columns that I would like to consolidate into more rows and less columns it has a similar structure to the example below:

| 1_a | 1_b | 1_c | 2_a | 2_b | 2_c |  d  |
|-----|-----|-----|-----|-----|-----|-----|
|  1  |  2  |  3  |  1  |  2  |  6  |  z  |
|  2  |  2  |  2  |  3  |  2  |  5  |  z  |
|  3  |  2  |  1  |  4  |  1  |  4  |  z  |

I want to combine some of the rows so they look like below:

| 1 | 2 | letter | d |
|---|---|--------|---|
| 1 | 1 |   a    | z |
| 2 | 3 |   a    | z |
| 3 | 4 |   a    | z |
| 2 | 2 |   b    | z |
| 2 | 2 |   b    | z |
| 2 | 1 |   b    | z |
| 3 | 6 |   c    | z |
| 2 | 5 |   c    | z |
| 1 | 4 |   c    | z |

I have created a new dataframe with the new headings, but am unsure how to map my original headings to the new headings when appending.

Thanks

2 Answers 2

3

Try

df = df.set_index('d')
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
df = df.stack().reset_index().rename(columns = {'level_1' : 'letter'})

    d   letter  1   2
0   z   a       1   1
1   z   b       2   2
2   z   c       3   6
3   z   a       2   3
4   z   b       2   2
5   z   c       2   5
6   z   a       3   4
7   z   b       2   1
8   z   c       1   4
Sign up to request clarification or add additional context in comments.

2 Comments

Hey Vaishali, thanks for your answer I think I have gotten it working I just have two questions for my understanding. 1 why did you set the index to column d at the beginning 2 how does the stack() know to make the first part of each header tuple the header and the second part the row index (before the reset_index().
Its better to set column d as index as otherwise even that will be split into multiindex with level 0 being d and level 1 being NaN. For 2nd qn, if you look at the documentation of stack, by default it stacks on level -1 which is 1 and 2 in this case which gives the desired output. I suggest that you break the solution above and see what happens after each step
0

For the most part, if you need to dynamically select column names you probably need to just write a Python loop. Just run through each letter manually then concat them together:

dfs = []
for letter in ('a', 'b', 'c'):
     group = df[['d']]
     group['1'] = df['1_' + letter]
     group['2'] = df['2_' + letter]
     group['letter'] = letter
     dfs.append(group)
result = pd.concat(dfs)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.