1

I have a dataframe which contains various datapoints over MULTIPLE years from ONE unit. This unit number is listed in the first column of my dataframe, named 'Unit'. The year variable is in the second column.

For visalisation, this is a mini version of my dataset. In reality is the size: 55 columns by 700000 rows.

import random
col3=[random.randrange(1,101,1) for _ in range (14)]
col4=[random.randrange(1,101,1) for _ in range (14)]

d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 
     'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col3' : col3, 'col4' : col4 }
df = pd.DataFrame(data=d)

With this dataset I want to look at the ratios between col3 and col4 within a year and between years. For this reason I want to make a three dimensional dataframe, which places year on an additional axis and not as a variable in my 2D frame.

Does anyone have tips on how to do this? and is this a good approach? suggestions?

Jen

3
  • Do you need pivot ? Commented Jun 5, 2019 at 12:13
  • Perhaps using the MultiIndex? Or create separate frame for each year and do comparisons that way? Commented Jun 5, 2019 at 12:15
  • 1
    There was something called a Panel in pandas, which might be what you want? But it's deprecated now (look into the link for alternatives). Commented Jun 5, 2019 at 12:27

1 Answer 1

3

Because Panel is deprecated now, here is alternative with MultiIndex:

df1 = df.set_index(['Unit','Year']).unstack()
print (df1)
      col3                    col4                  
Year  2014  2015  2016  2017  2014  2015  2016  2017
Unit                                                
1     58.0  80.0  37.0  79.0  12.0  32.0  22.0  39.0
2      NaN  90.0  46.0  98.0   NaN  73.0  95.0  28.0
3      NaN   NaN   NaN  52.0   NaN   NaN   NaN  90.0
4     90.0   NaN   NaN   NaN  92.0   NaN   NaN   NaN
5      NaN   9.0   NaN   NaN   NaN   3.0   NaN   NaN
6     33.0  22.0  41.0  40.0  21.0   5.0  69.0  10.0

df2 = df.set_index(['Year','Unit']).unstack()
print (df2)
      col3                               col4                             
Unit     1     2     3     4    5     6     1     2     3     4    5     6
Year                                                                      
2014  58.0   NaN   NaN  90.0  NaN  33.0  12.0   NaN   NaN  92.0  NaN  21.0
2015  80.0  90.0   NaN   NaN  9.0  22.0  32.0  73.0   NaN   NaN  3.0   5.0
2016  37.0  46.0   NaN   NaN  NaN  41.0  22.0  95.0   NaN   NaN  NaN  69.0
2017  79.0  98.0  52.0   NaN  NaN  40.0  39.0  28.0  90.0   NaN  NaN  10.0

Another way is use xarray.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.