how to aggregate pandas data into flat dataframe (without hierarchical indexes)?

Question

I have this data of measurements at two time values with replicates:

name    t   value   replicate
foo 1   0.5 a
foo 1   0.55    b
foo 1   0.6 c
foo 2   0.7 a
foo 2   0.71    b
foo 2   0.72    c
bar 1   0.1 a
bar 1   0.12    b
bar 1   0.3 c
bar 2   0.4 a
bar 2   0.45    b
bar 2   0.44    c

I want to parse it into dataframe and get the mean and standard deviation of the replicates for each time point ("t" column) and for each sample ("name" column). This can be done with:

df = pandas.read_table("data.txt",sep="\t")
g = df.groupby(["name", "t"])
new_df = g.agg([np.mean, np.std])

The problem is that new_df has a hierarchical index:

           value          
            mean       std
name t                    
bar  1  0.173333  0.110151
     2  0.430000  0.026458
foo  1  0.550000  0.050000
     2  0.710000  0.010000

How can I get a flat dataframe instead where the mean and std values are just regular columns? I tried reset_index() but that does not do it:

>>> new_df.reset_index()
  name  t     value          
               mean       std
0  bar  1  0.173333  0.110151
1  bar  2  0.430000  0.026458
2  foo  1  0.550000  0.050000
3  foo  2  0.710000  0.010000

i'd like the final dataframe to have columns: sample, t, mean, std (or value_mean, value_std). How can this be done in pandas?

wflynny · Accepted Answer · 2016-04-04 18:25:20Z

3

I would do something slightly different from MaxU. Try resetting the index to a specific column level and then drop the other column level(s).

In [5]: new_df2 = new_df.copy()

In [6]: new_df2 = new_df2.reset_index(col_level=1)

In [7]: new_df2.columns = new_df2.columns.get_level_values(1) # same level=1

In [8]: new_df2
Out[8]: 
  name  t      mean       std
0  bar  1  0.173333  0.110151
1  bar  2  0.430000  0.026458
2  foo  1  0.550000  0.050000
3  foo  2  0.710000  0.010000

Edit:

With MultiIndexs, which can be used to setup a multi-level arrangement of either your index (vertical column) or column labels (your case), the column labels are stored as levels and their positions are stored as labels. Like this:

In [4]: df.columns
Out[4]: 
MultiIndex(levels=[[u'value'], [u'mean', u'std']],
           labels=[[0, 0], [0, 1]])

By doing reset_index(col_level=1), we transform the MultiIndex into

In [5]: df.reset_index(col_level=1).columns
Out[5]: 
MultiIndex(levels=[[u'value', u''], [u'mean', u'std', u't', u'name']],
           labels=[[1, 1, 0, 0], [3, 2, 0, 1]])

which takes the labels out of the Index and puts them into level 1 (the second/lower level) of the column MultiIndex. Then columns = columns.get_level_values(1) grabs the values of the column labels at level 1, and sets only those values as the column labels, effectively dropping level 0.

 Out[6]: Index([u'name', u't', u'mean', u'std'], dtype='object')

edited Apr 4, 2016 at 18:25

answered Mar 31, 2016 at 19:07

wflynny

18.6k6 gold badges50 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mvd Over a year ago

can you explain what get_level_values does here?

MaxU - stand with Ukraine · Accepted Answer · 2016-03-31 20:11:04Z

2

try to rename your columns:

In [9]: new_df.reset_index(inplace=True)

let's set the column names in the following way: take level==1 column if it exists, otherwise take column with level==0

In [14]: new_df.columns = [c[1] if c[1] else c[0] for c in new_df.columns.tolist()]

In [15]: new_df
Out[15]:
  name  t      mean       std
0  bar  1  0.173333  0.110151
1  bar  2  0.430000  0.026458
2  foo  1  0.550000  0.050000
3  foo  2  0.710000  0.010000

edited Mar 31, 2016 at 20:11

answered Mar 31, 2016 at 18:14

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

2 Comments

mvd Over a year ago

can you explain what your code does and whether it will generalize? is there a pandas built in that does the same?

MaxU - stand with Ukraine Over a year ago

@mvd, i've added a comment to my answer - please check

Collectives™ on Stack Overflow

how to aggregate pandas data into flat dataframe (without hierarchical indexes)?

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related