2

I have a dataframe of lists, each value in the list represents the mean, std, and number of values of a larger dataset. I would like to create a subindex for three values in that list.

An example dataframe is:

np.random.seed(2)
d={i: {j:[np.random.randint(10) for i in range(0,3)] for j in ['x','y','z']} for i in ['a','b','c']}
pd.DataFrame.from_dict(d,orient='index')

Which gives:

    x   y   z
a   [1, 4, 5]   [7, 4, 4]   [0, 6, 3]
b   [7, 1, 9]   [1, 3, 8]   [3, 6, 2]
c   [1, 6, 6]   [6, 5, 0]   [6, 5, 9]

I would like:

    x              y              z
    mean std count mean std count mean std count
a   1    4   5     7    4   4     0    6   3
b   7    1   9     1    3   8     3    6   2
c   1    6   6     6    5   0     6    5   9
2
  • Seems like you could do this then this. Commented Dec 13, 2020 at 22:52
  • 1
    helps if you put a np.random.seed(2) or some other number, so the randomised data is constant Commented Dec 13, 2020 at 23:22

1 Answer 1

3

You can concatenate the inner lists with numpy concatenate and numpy vstack, build the MultiIndex columns, then generate a new dataframe:

np.random.seed(2)
d = {
    i: {j: [np.random.randint(10) for i in range(0, 3)] for j in ["x", "y", "z"]}
    for i in ["a", "b", "c"]
}
df = pd.DataFrame.from_dict(d, orient="index")
df

        x          y            z
a   [8, 8, 6]   [2, 8, 7]   [2, 1, 5]
b   [4, 4, 5]   [7, 3, 6]   [4, 3, 7]
c   [6, 1, 3]   [5, 8, 4]   [6, 3, 9]

data = np.vstack([np.concatenate(entry) for entry in df.to_numpy()])
columns = pd.MultiIndex.from_product([df.columns, ["mean", "std", "count"]])
pd.DataFrame(data, columns=columns, index = df.index)


                   x                 y                    z
    mean    std count   mean    std count   mean    std count
a      8    8   6        2      8   7       2       1   5
b      4    4   5        7      3   6       4       3   7
c      6    1   3        5      8   4       6       3   9

UPDATE : October 5, 2021

Another option is to convert the initial dataframe to a dictionary and concatenate with pd.concat :

outcome = {k:pd.DataFrame([*v], 
                          columns = ['mean', 'std', 'count'], 
                          index = v.index) 
           for k,v in df.items()}

pd.concat(outcome, axis = 1)
 
     x              y              z          
  mean std count mean std count mean std count
a    8   8     6    2   8     7    2   1     5
b    4   4     5    7   3     6    4   3     7
c    6   1     3    5   8     4    6   3     9

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.