Creating a dataframe column of multiple columns

Question

I have a dataframe with a large number of columns that I would like to consolidate into more rows and less columns it has a similar structure to the example below:

| 1_a | 1_b | 1_c | 2_a | 2_b | 2_c |  d  |
|-----|-----|-----|-----|-----|-----|-----|
|  1  |  2  |  3  |  1  |  2  |  6  |  z  |
|  2  |  2  |  2  |  3  |  2  |  5  |  z  |
|  3  |  2  |  1  |  4  |  1  |  4  |  z  |

I want to combine some of the rows so they look like below:

| 1 | 2 | letter | d |
|---|---|--------|---|
| 1 | 1 |   a    | z |
| 2 | 3 |   a    | z |
| 3 | 4 |   a    | z |
| 2 | 2 |   b    | z |
| 2 | 2 |   b    | z |
| 2 | 1 |   b    | z |
| 3 | 6 |   c    | z |
| 2 | 5 |   c    | z |
| 1 | 4 |   c    | z |

I have created a new dataframe with the new headings, but am unsure how to map my original headings to the new headings when appending.

Thanks

Vaishali · Accepted Answer · 2017-11-22 00:23:05Z

3

Try

df = df.set_index('d')
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
df = df.stack().reset_index().rename(columns = {'level_1' : 'letter'})

    d   letter  1   2
0   z   a       1   1
1   z   b       2   2
2   z   c       3   6
3   z   a       2   3
4   z   b       2   2
5   z   c       2   5
6   z   a       3   4
7   z   b       2   1
8   z   c       1   4

answered Nov 22, 2017 at 0:23

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Christopher Ell Over a year ago

Hey Vaishali, thanks for your answer I think I have gotten it working I just have two questions for my understanding. 1 why did you set the index to column d at the beginning 2 how does the stack() know to make the first part of each header tuple the header and the second part the row index (before the reset_index().

Vaishali Over a year ago

Its better to set column d as index as otherwise even that will be split into multiindex with level 0 being d and level 1 being NaN. For 2nd qn, if you look at the documentation of stack, by default it stacks on level -1 which is 1 and 2 in this case which gives the desired output. I suggest that you break the solution above and see what happens after each step

Mark Whitfield · Accepted Answer · 2017-11-22 00:22:15Z

0

For the most part, if you need to dynamically select column names you probably need to just write a Python loop. Just run through each letter manually then concat them together:

dfs = []
for letter in ('a', 'b', 'c'):
     group = df[['d']]
     group['1'] = df['1_' + letter]
     group['2'] = df['2_' + letter]
     group['letter'] = letter
     dfs.append(group)
result = pd.concat(dfs)

answered Nov 22, 2017 at 0:22

Mark Whitfield

2,5381 gold badge14 silver badges12 bronze badges

Collectives™ on Stack Overflow

Creating a dataframe column of multiple columns

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related