1

I have a DataFrame that looks something like this:

df = pd.DataFrame({
    'A': [0, 1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'V1': [0.0, 0.1, 0.2, 0.3, 0.4],
    'V2': [1.0, 1.1, 1.2, 1.3, 1.4],
    'V3': [2.0, 2.1, 2.2, 2.3, 2.4],
    'X': ['alpha', 'beta', 'gamma', 'delta', 'epsilon'],
})
   A  B   V1   V2   V3        X
0  0  a  0.0  1.0  2.0    alpha
1  1  b  0.1  1.1  2.1     beta
2  2  c  0.2  1.2  2.2    gamma
3  3  d  0.3  1.3  2.3    delta
4  4  e  0.4  1.4  2.4  epsilon

I'd like to use the number in the V columns to spread this out into a long-form table. The number in the column label (1 for V1, 2 for V2 etc) would become a new column value in a column named "V Number" or whatever and the value would be the only "V" in that row. Something like this (I've hidden the index here as I don't care about that):

A  B  V Number    V        X
0  a      1     0.0    alpha     # Old first row, V1 value
0  a      2     1.0    alpha     # Old first row, V2 value
0  a      3     2.0    alpha     # Old first row, V3 value
1  b      1     0.1     beta     # Old second row, V1 value
1  b      2     1.1     beta     # etc...
1  b      3     2.1     beta
2  c      1     0.2    gamma
2  c      2     1.2    gamma
2  c      3     2.2    gamma
3  d      1     0.3    delta
3  d      2     1.3    delta
3  d      3     2.3    delta
4  e      1     0.4  epsilon
4  e      2     1.4  epsilon
4  e      3     2.4  epsilon

In the real DataFrame there are over 40 "V" columns, over 100 other columns and several thousand rows, so a reasonable simple and fast method would be nice! In case it helps, the column names are easy to isolate (they're actually called e.g. Test Voltage (3)', but I shortened them for the purposes of the example) with something like [i for i in df.columns if 'Test Voltage' in i].

Has anyone got any ideas of a straightforward way to do this? I've tried searching for lots of methods, but keep just finding ways to split columns with lists in the cells.

3
  • 2
    df.melt(['A','B','X']) Or df.set_index(['A','B','X']).stack().reset_index()? Commented Oct 26, 2021 at 14:50
  • 2
    Does this answer your question: stackoverflow.com/questions/50098113/… Commented Oct 26, 2021 at 14:55
  • @arp5, Yes, it looks like that's the same problem: you're obviously better at searching stackoverflow than I am! However, BENY's answer below is better than the ones on that question, so I'm still glad I asked. Commented Oct 26, 2021 at 15:36

3 Answers 3

8

Try with wide_to_long

out = pd.wide_to_long(df,['V'],i=['A','B','X'],j='number').reset_index()
Out[23]: 
    A  B        X  number    V
0   0  a    alpha       1  0.0
1   0  a    alpha       2  1.0
2   0  a    alpha       3  2.0
3   1  b     beta       1  0.1
4   1  b     beta       2  1.1
5   1  b     beta       3  2.1
6   2  c    gamma       1  0.2
7   2  c    gamma       2  1.2
8   2  c    gamma       3  2.2
9   3  d    delta       1  0.3
10  3  d    delta       2  1.3
11  3  d    delta       3  2.3
12  4  e  epsilon       1  0.4
13  4  e  epsilon       2  1.4
14  4  e  epsilon       3  2.4
Sign up to request clarification or add additional context in comments.

Comments

5

Use melt:

>>> df.melt(id_vars=['A', 'B', 'X'], var_name='V Number', value_name='V')
    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   1  b     beta       V1  0.1
2   2  c    gamma       V1  0.2
3   3  d    delta       V1  0.3
4   4  e  epsilon       V1  0.4
5   0  a    alpha       V2  1.0
6   1  b     beta       V2  1.1
7   2  c    gamma       V2  1.2
8   3  d    delta       V2  1.3
9   4  e  epsilon       V2  1.4
10  0  a    alpha       V3  2.0
11  1  b     beta       V3  2.1
12  2  c    gamma       V3  2.2
13  3  d    delta       V3  2.3
14  4  e  epsilon       V3  2.4

Comments

0

You can also use .stack(), as follows:

(df.set_index(['A', 'B', 'X'])
   .rename_axis(columns='V Number')
   .stack()
   .reset_index(name='V')
)

Result:

    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   0  a    alpha       V2  1.0
2   0  a    alpha       V3  2.0
3   1  b     beta       V1  0.1
4   1  b     beta       V2  1.1
5   1  b     beta       V3  2.1
6   2  c    gamma       V1  0.2
7   2  c    gamma       V2  1.2
8   2  c    gamma       V3  2.2
9   3  d    delta       V1  0.3
10  3  d    delta       V2  1.3
11  3  d    delta       V3  2.3
12  4  e  epsilon       V1  0.4
13  4  e  epsilon       V2  1.4
14  4  e  epsilon       V3  2.4

If you want the V Number column to have only the number, you can use:

df2 = (df.set_index(['A', 'B', 'X'])
         .rename_axis(columns='V Number')
         .stack()
         .reset_index(name='V')
      )

df2['V Number'] = df2['V Number'].str[1:]

Result:

print(df2)

    A  B        X V Number    V
0   0  a    alpha        1  0.0
1   0  a    alpha        2  1.0
2   0  a    alpha        3  2.0
3   1  b     beta        1  0.1
4   1  b     beta        2  1.1
5   1  b     beta        3  2.1
6   2  c    gamma        1  0.2
7   2  c    gamma        2  1.2
8   2  c    gamma        3  2.2
9   3  d    delta        1  0.3
10  3  d    delta        2  1.3
11  3  d    delta        3  2.3
12  4  e  epsilon        1  0.4
13  4  e  epsilon        2  1.4
14  4  e  epsilon        3  2.4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.