2

I have a dataframe, where only "peak_time" is a column:

stimulus position peak_time 
1        1        1.0
         2        1.5
2        1        2.0
         2        2.0
3        1        2.5

Now I'm trying to squeeze the position column and get lists instead, so that it looks like this:

stimulus peak_time  
1        [1.0, 1.5]
2        [2.0, 2.0]
3        [2.5]

It's probably super easy, but I can't find any solution using goole. If someone opened this topic already, I'd appreciate the respective link, too. Thanks for your help!

Code to create dataframe:

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_   = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_

EDIT

As I can only post one question every 90 minutes, I'd like to ask a follow up question underneath this post. So now I have two panda series looking like this:

median_:
stimulus
1    [1.0, 1.5]
2    [2.0, 2.0]
3    [2.0]

quartile_:
stimulus
1    [[1.0, 70.0],  [1.0, 183.25]]
2    [[1.0, 65.75], [2.0, 98.75]]
3    [[1.0, 51.25]]

I want to substract median_ from quartile_ so that I get

distance_: 
stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]

Is there a simple way to do that? abs(median_ - quartile_) does not work.

Code to create series:

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)

Solution

Apply groupby(level=0).apply(list) later, so

median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))

Then I can substract them easily

distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_

stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]
0

1 Answer 1

3

It is MultiIndex Series, so need Series.groupby with apply list:

#added column peak_time
median_   = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df        = median_.groupby(level=0).apply(list).reset_index()
print (df)
   stimulus   peak_time
0         1  [1.0, 1.5]
1         2  [2.0, 2.0]
2         3       [2.5]
Sign up to request clarification or add additional context in comments.

4 Comments

That was probably the quickest response ever. Solves my problem entirely, thanks!! Gonna mark it as an answer in 6 min (stackoverflow makes me wait)..
I have a follow up question, maybe you know a quick solution for that, too?
@SvennoNito - Not so easy, because pandas native dont support nested list very nice.
@jezreal, I got a workaround, by applying your solution just on the distance measure. So distance_ = median_ - quartile_ and then distance_.groupby(level=0).apply(list)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.