1

Problem

I've got a wide dataframe which shows sale prices and volume by State for various time periods. However I want to transform (unpivot) the dataframe into a long dataframe. this is easy enough to do in SQL with UNPIVOT, but I am struggling to figure out how to do it in pandas. Any help be appreciated!

What I've tried

I've tried using both pd.melt, & pd.wide_to_long, but without success. example below.

Example

df = pd.DataFrame({'time': ['t1', 't2', 't3', 't4', 't5'],
                   'prod': ['A', 'B', 'C', 'D', 'E'],
                   'price_qld': [4, 3, 6, 3, 8],
                   'price_nsw': [7, 4, 7, 3, 5],
                   'price_vic': [9, 4, 6, 23, 7],
                   'vol_qld': [11, 43, 232, 234, 42],
                   'vol_nsw': [73, 44, 657, 53, 785],
                   'vol_vic': [95, 34, 666, 273, 87],
                   'flag_qld': [1, 1, 1, 1, 0],
                   'flag_nsw': [0, 1, 0, 1, 0],
                   'flag_vic': [1, 1, 1, 0, 1]
                   })
print(df)

new_df = pd.wide_to_long(df, ['price', 'vol', 'flag'], i=['time', 'prod'], j='State', sep='_')

Current Dataframe

  time prod  price_qld  price_nsw  ...  vol_vic  flag_qld  flag_nsw  flag_vic
0   t1    A          4          7  ...       95         1         0         1
1   t2    B          3          4  ...       34         1         1         1
2   t3    C          6          7  ...      666         1         0         1
3   t4    D          3          3  ...      273         1         1         0
4   t5    E          8          5  ...       87         0         0         1

Desired Dataframe

  time prod state  price  vol  flag
0   t1    A   qld      4   11     1
1   t1    A   nsw      7   73     0
2   t1    A   vic      9   95     1
3   t2    B   qld      3   43     1
4   t2    B   nsw      4   44     1
5   t2    B   vic      4   34     1
6   t3    C   qld      6  232     1
7   t3    C   nsw      7  657     0
8   t3    C   vic      6  666     1

3 Answers 3

5

You are close, need suffix='\w+' for get non-integers as suffixes:

new_df = (pd.wide_to_long(df, ['price', 'vol', 'flag'],
                         i=['time', 'prod'],
                         j='State', 
                         sep='_', 
                         suffix='\w+')
             .reset_index())
    
print (new_df)
   time prod State  price  vol  flag
0    t1    A   qld      4   11     1
1    t1    A   nsw      7   73     0
2    t1    A   vic      9   95     1
3    t2    B   qld      3   43     1
4    t2    B   nsw      4   44     1
5    t2    B   vic      4   34     1
6    t3    C   qld      6  232     1
7    t3    C   nsw      7  657     0
8    t3    C   vic      6  666     1
9    t4    D   qld      3  234     1
10   t4    D   nsw      3   53     1
11   t4    D   vic     23  273     0
12   t5    E   qld      8   42     0
13   t5    E   nsw      5  785     0
14   t5    E   vic      7   87     1

Another approach:

#convert all columns without separatot to MultiIndex
new_df = df.set_index(['time', 'prod'])
#split columns by separator
new_df.columns = new_df.columns.str.split('_', expand=True)
#reshape by stack
new_df = new_df.stack().reset_index().rename(columns={'level_2':'state'})
    
print (new_df)
   time prod state  flag  price  vol
0    t1    A   nsw     0      7   73
1    t1    A   qld     1      4   11
2    t1    A   vic     1      9   95
3    t2    B   nsw     1      4   44
4    t2    B   qld     1      3   43
5    t2    B   vic     1      4   34
6    t3    C   nsw     0      7  657
7    t3    C   qld     1      6  232
8    t3    C   vic     1      6  666
9    t4    D   nsw     1      3   53
10   t4    D   qld     1      3  234
11   t4    D   vic     0     23  273
12   t5    E   nsw     0      5  785
13   t5    E   qld     0      8   42
14   t5    E   vic     1      7   87
Sign up to request clarification or add additional context in comments.

Comments

1

Another approach would be to use the pivot_longer function from pyjanitor; it is a wrapper around pandas' melt, with more flexibility:

In [219]: df.pivot_longer(index = ['time', 'prod'], 
                          names_to=('.value', 'state'), 
                          names_sep="_")
Out[219]: 
   time prod state  price  vol  flag
0    t1    A   qld      4   11     1
1    t2    B   qld      3   43     1
2    t3    C   qld      6  232     1
3    t4    D   qld      3  234     1
4    t5    E   qld      8   42     0
5    t1    A   nsw      7   73     0
6    t2    B   nsw      4   44     1
7    t3    C   nsw      7  657     0
8    t4    D   nsw      3   53     1
9    t5    E   nsw      5  785     0
10   t1    A   vic      9   95     1
11   t2    B   vic      4   34     1
12   t3    C   vic      6  666     1
13   t4    D   vic     23  273     0
14   t5    E   vic      7   87     1

In [220]: df.pivot_longer(index = ['time', 'prod'], 
                          names_to=('.value', 'state'), 
                          names_sep="_", 
                          sort_by_appearance=True)
Out[220]: 
   time prod state  price  vol  flag
0    t1    A   qld      4   11     1
1    t1    A   nsw      7   73     0
2    t1    A   vic      9   95     1
3    t2    B   qld      3   43     1
4    t2    B   nsw      4   44     1
5    t2    B   vic      4   34     1
6    t3    C   qld      6  232     1
7    t3    C   nsw      7  657     0
8    t3    C   vic      6  666     1
9    t4    D   qld      3  234     1
10   t4    D   nsw      3   53     1
11   t4    D   vic     23  273     0
12   t5    E   qld      8   42     0
13   t5    E   nsw      5  785     0
14   t5    E   vic      7   87     1

The .value matches (price, vol, flag) after the columns have been split by names_sep(_), while state captures the values after names_sep

Comments

0

The suffix '\w+' did not work for me - an example column name for me is "Oil_LAZY M 23 CO 1HM". However, the suffix '.+' worked perfectly, as it just retrieved whatever was after the separator, which in my case was also an underscore.

1 Comment

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.