6

I have a Dataframe that looks something like this:

   Deal  Year  Quarter_1  Quarter_2  Quarter_3  Financial_Data
h     1  1991          1          2          3             120
i     2  1992          4          5          6              80
j     3  1993          7          8          9             100

I want to combine all the quarters into one new column and copy the deal number, year and financial data. The end result should then look like this:

   Deal  Year  Quarter  Financial_Data
h     1  1991        1             120
i     1  1991        2             120
j     1  1991        3             120
k     2  1992        4              80
l     2  1992        5              80
m     2  1992        6              80
n     3  1993        7             100
o     3  1993        8             100
p     3  1993        9             100
2
  • 1
    What have you tried so far, and how is it not working as expected ? Commented Apr 30, 2018 at 10:01
  • 1
    I haven't really tried anything, i'm new to python and don't how i would even approach this problem Commented Apr 30, 2018 at 10:04

3 Answers 3

8

You can use melt method.

df = pd.melt(d, id_vars=["Deal", "Year", "Financial_Data"], 
             value_name="Quarter").drop(['variable'],axis=1).sort_values('Quarter')

Output

   Deal  Year  Financial_Data  Quarter
0     1  1991             120        1
3     1  1991             120        2
6     1  1991             120        3
1     2  1992              80        4
4     2  1992              80        5
7     2  1992              80        6
2     3  1993             100        7
5     3  1993             100        8
8     3  1993             100        9

If you have many columns, you can use df.columns.tolist() method in order to achieve your requirement.

column_list = df.columns.tolist()
id_vars_list = column_list[:2] + column_list[-1:]

The statement will become

df = pd.melt(d, id_vars=id_vars_list, 
             value_name="Quarter").drop(['variable'],axis=1).sort_values('Quarter')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer! One more question, I have a rather large dataset with over 200 columns is there any shortcut so i dont have to enter all the headings into id_vars?
@Dan, my thought was to get first two columns and the last one.
3

This is done using melt:

pd.melt(df, id_vars=['Deal','Year','Financial_Data'], value_vars=['Quarter_1','Quarter_2','Quarter_3'])
   Deal  Year  Financial_Data   variable  value
0     1  1991             120  Quarter_1      1
1     2  1992              80  Quarter_1      4
2     3  1993             100  Quarter_1      7
3     1  1991             120  Quarter_2      2
4     2  1992              80  Quarter_2      5
5     3  1993             100  Quarter_2      8
6     1  1991             120  Quarter_3      3
7     2  1992              80  Quarter_3      6
8     3  1993             100  Quarter_3      9

Cleaning it up a little:

>>> pd.melt(df, id_vars=['Deal','Year','Financial_Data'], value_vars=['Quarter_1','Quarter_2','Quarter_3']).drop('variable',axis=1).sort_values('value')
   Deal  Year  Financial_Data  value
0     1  1991             120      1
3     1  1991             120      2
6     1  1991             120      3
1     2  1992              80      4
4     2  1992              80      5
7     2  1992              80      6
2     3  1993             100      7
5     3  1993             100      8
8     3  1993             100      9

1 Comment

Yes, just extract a list of the columns you want from df.columns
1

One way is to combine your Quarter_X data into a list. Then expand the list series via numpy / itertools in a new dataframe.

This is usually more efficient than stack or groupby based methods. Note that the resulting index is extracted from the parent row. You will need to reindex as required.

from itertools import chain
import numpy as np

df['Quarters'] = list(zip(df.Quarter_1, df.Quarter_2, df.Quarter_3))

lens = list(map(len, df.Quarters))

res = pd.DataFrame({'Deal': np.repeat(df.Deal, lens),
                    'Year': np.repeat(df.Year, lens),
                    'Quarter': list(chain.from_iterable(df.Quarters)),
                    'FinancialData': np.repeat(df.FinancialData, lens)})

print(res)

   Deal  FinancialData  Quarter  Year
h     1            120        1  1991
h     1            120        2  1991
h     1            120        3  1991
i     2             80        4  1992
i     2             80        5  1992
i     2             80        6  1992
j     3            100        7  1993
j     3            100        8  1993
j     3            100        9  1993

For multiple columns, the above method may be expensive, but you could do:

res = pd.DataFrame({**{'Quarter': list(chain.from_iterable(df.Quarters))},
                    **{k: np.repeat(df[k], lens) for k in df if 'Quarter' not in k}})

1 Comment

I have to do this for a rather large dataset, around 200 different columns, is there any way that I do not have to type out 'Deal': np.repeat(df.Deal, lens) for every single heading?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.