Manipulating Pandas Dataframe with MultiIndex

Question

I have a pandas DataFrame formatted as such:

  mesh 1          energy low [eV] energy high [eV] nuclide score     mean  
           x    y   z                                                           
0          1    1   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
1          1    1   2        1.00e-03         2.00e+07   total  flux 1.82e-03   
2          1    1   3        1.00e-03         2.00e+07   total  flux 6.96e-03   
3          1    1   4        1.00e-03         2.00e+07   total  flux 1.47e-03   
4          1    1   5        1.00e-03         2.00e+07   total  flux 6.93e-03   
5          1    1   6        1.00e-03         2.00e+07   total  flux 8.73e-03   
6          1    1   7        1.00e-03         2.00e+07   total  flux 1.34e-02   
7          1    1   8        1.00e-03         2.00e+07   total  flux 1.16e-02   
8          1    1   9        1.00e-03         2.00e+07   total  flux 4.14e-03   
9          1    1  10        1.00e-03         2.00e+07   total  flux 5.26e-03   
10         1    2   1        1.00e-03         2.00e+07   total  flux 6.16e-03   
11         1    2   2        1.00e-03         2.00e+07   total  flux 1.76e-02   
12         1    2   3        1.00e-03         2.00e+07   total  flux 1.80e-02   
13         1    2   4        1.00e-03         2.00e+07   total  flux 1.97e-02   
14         1    2   5        1.00e-03         2.00e+07   total  flux 1.76e-02   
15         1    2   6        1.00e-03         2.00e+07   total  flux 1.90e-02   
16         1    2   7        1.00e-03         2.00e+07   total  flux 3.53e-02   
17         1    2   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
18         1    2   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
19         1    2  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
20         1    3   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
21         1    3   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
22         1    3   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
23         1    3   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
24         1    3   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
25         1    3   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
26         1    3   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
27         1    3   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
28         1    3   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
29         1    3  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
...      ...  ...  ..             ...              ...     ...   ...      ...   
99970    100   98   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99971    100   98   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99972    100   98   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99973    100   98   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99974    100   98   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99975    100   98   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99976    100   98   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99977    100   98   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99978    100   98   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99979    100   98  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
99980    100   99   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99981    100   99   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99982    100   99   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99983    100   99   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99984    100   99   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99985    100   99   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99986    100   99   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99987    100   99   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99988    100   99   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99989    100   99  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
99990    100  100   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99991    100  100   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99992    100  100   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99993    100  100   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99994    100  100   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99995    100  100   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99996    100  100   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99997    100  100   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99998    100  100   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99999    100  100  10        1.00e-03         2.00e+07   total  flux 0.00e+00   

RangeIndex(start=0, stop=100000, step=1)
MultiIndex(levels=[['energy high [eV]', 'energy low [eV]', 'mean', 'mesh 1', 'nuclide', 'score', 'std. dev.'], ['', 'x', 'y', 'z']],
           labels=[[3, 3, 3, 1, 0, 4, 5, 2, 6], [1, 2, 3, 0, 0, 0, 0, 0, 0]])

I would like to have 10 pandas dataframes (since 'mesh 1', 'z' goes to 10) in a list where in each dataframe the rows are ('mesh 1', 'y'), the columns are ('mesh 1', 'x') and the values are 'mean'. I have figured out how to get the 10 dataframes in a list:

axial_dfs = []
    for i in range(10):
        temp_df = flux_df[flux_df['mesh 1']['z'] == i]
        axial_dfs.append(temp_df)

But I can't figure out how to change the rows and columns. I would try pivot but I don't know how with the MultiIndex for 'mesh 1'.

Appreciate all the help! Thanks!

Can you include the index and columns of the original dataframe? Just print df.index and df.columns and include in your answer. I ask because it is a bit hard to read the raw dataframe. — NickBraunagel
– NickBraunagel, Commented Mar 26, 2018 at 18:43

NickBraunagel · Accepted Answer · 2018-03-26 20:42:54Z

1

I'm a little confused about what you need but I think merging the column levels together in your temp_df will help you:

axial_dfs = []
    for i in range(10):
        temp_df = flux_df[flux_df['mesh 1']['z'] == i]
        temp_df.columns = temp_df.columns.map('_'.join)  # add this line
        axial_dfs.append(temp_df)

Now, all of the frames in axial_dfs will have one level of columns (e.g. mesh 1_x or mesh 1_y), which it sounds like you're comfortable manipulating on your own (using pandas.DataFrame.pivot_table or pandas.DataFrame.groupby).

answered Mar 26, 2018 at 20:42

NickBraunagel

1,6091 gold badge19 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sterling Butters Over a year ago

Yes this will help me so much

Sterling Butters Over a year ago

Would this potentially invoke some sort of KeyError? I cant use pivot because I get a KeyError no matter what I put as the index, the column, or the values

NickBraunagel Over a year ago

Print the columns of temp_df (print temp_df.columns) to see what the new post-merged columns are now called. You are probably referencing the columns of the original frame, which is throwing the KeyError

Sterling Butters Over a year ago

Not sure if that was actually the problem but I’m guessing pivoting is immutable because assigning the pivoted table to a new variable got it working. I may have changed some tiger stuff around too but wouldn’t have got it without the columns.map so thank you!

Jordi · Accepted Answer · 2018-03-26 19:32:31Z

In the following example, I use unstack to turn the second index level into a column index. Then, I use a list comprehension to split the result into a list determined by the levels of the first index.

import pandas as pd
import numpy as np

# Create simple example
data = np.random.randint(8, size=(8, 2))
levels = [['df1', 'df2'], ['a', 'b'], [1, 2]]
idx = pd.MultiIndex.from_product(levels, names=['first', 'second', 'third'])
df = pd.DataFrame(data, index=idx, columns=['col1', 'col2'])

# Step 1: unstack to get second level as column index
df = df.unstack(level='second')['col2']

# Step 2: get a list of chunks of df by first index level
first_unique = df.index.get_level_values('first').unique()
df_ls = [df.loc[x] for x in first_unique]

Collectives™ on Stack Overflow

Manipulating Pandas Dataframe with MultiIndex

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related