1

I have a dataframe

df1 = pd.DataFrame(data={'col1': [21, 44, 28, 32, 20, 39, 42], 
                         'col2': ['<1', '>2', '>2', '>3', '<1', '>2', '>4'], 
                         'col3': ['yes', 'yes', 'no', 'no', 'yes', 'no', 'yes'], 
                         'col4': [1, 1, 0, 0, 1, 0, 1], 
                         'Group': [0, 2, 1, 1, 0, 1, 2] })
   col1 col2 col3  col4  Group
0    21   <1  yes     1      0
1    44   >2  yes     1      2
2    28   >2   no     0      1
3    32   >3   no     0      1
4    20   <1  yes     1      0
5    39   >2   no     0      1
6    42   >4  yes     1      2

Based on 'Group' column values 0,1,2 I need to create a nested dictionary like

{0: {'col1': {'mean': 20.5, 'sd': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}},
 1: {'col1': {'mean': 33, 'sd': 4.54},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0, 'sd': 0}},
 2: {'col1': {'mean': 43, 'sd': 1},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}}}

col1 and col4 are numeric, col2 and col3 are string

For Group 0:

'col1' : mean and stddev of values 20,21

'col2' : which has only '<1'

'col3' : has only 'yes'

'col4' : mean and stddev of values 1,1

For Group 1:

'col1' : mean and stddev of values 28,32,39

'col2' : which has '>2' and '>3'

'col3' : has only 'no'

'col4' : mean and stddev of values 0,0,0

For Group 2:

'col1' : mean and stddev of values 42,44

'col2' : which has '>2' and '>4'

'col3' : has only 'yes'

'col4' : mean and stddev of values 1,1

2
  • What the logic you split them into different group Commented Jun 9, 2022 at 2:37
  • Based on group column values 0,1,2 Commented Jun 9, 2022 at 2:39

1 Answer 1

3
df1.groupby('Group').agg(lambda x: x.unique().tolist() if x.dtype =='object' 
                         else {'mean':x.mean(), 'std':x.std(ddof=0)}).T.to_dict('index')

Out[396]: 
{0: {'col1': {'mean': 20.5, 'std': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}},
 1: {'col1': {'mean': 33.0, 'std': 4.546060565661952},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0.0, 'std': 0.0}},
 2: {'col1': {'mean': 43.0, 'std': 1.0},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}}}
Sign up to request clarification or add additional context in comments.

2 Comments

Nice one, mine overthinking ~ ~
You may can change the last part to .to_dict('index')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.