1

I have a list of numpy arrays - for example:

Lets call this LIST_A:

[array([  0.        , -11.35190205,  11.35190205,   0.        ]),
 array([  0.        ,  36.58012599, -36.58012599,   0.        ]),
 array([  0.        , -41.94408202,  41.94408202,   0.        ])]

I have a list of lists that are indicies for each of the numpy arrays in the above list of numpy arrays:

Lets call this List_B:

[['A_A', 'A_B', 'B_A', 'B_B'],
 ['A_A', 'A_D', 'D_A', 'D_D'],
 ['B_B', 'B_C', 'C_B', 'C_C']]

I want to create a pandas dataframe from these objects and I'm not sure how I can do this without first creating series objects for each of the numpy arrays in LIST_A with their associated index in LIST_B (i.e. LIST_A[0]'s index is LIST_B[0] etc) and then doing a pd.concat(s1,s2,s3...) to get the desired dataframe.

In the above case I can construct the desired dataframe as follows:

s1 = pd.Series(list_a[0], index=list_b[0])
s2 = pd.Series(list_a[1], index=list_b[1])
s3 = pd.Series(list_a[2], index=list_b[2])
df = pd.concat([s1,s2,s3], axis=1)

            0          1          2
A_A   0.000000   0.000000        NaN
A_B -11.351902        NaN        NaN
A_D        NaN  36.580126        NaN
B_A  11.351902        NaN        NaN
B_B   0.000000        NaN   0.000000
B_C        NaN        NaN -41.944082
C_B        NaN        NaN  41.944082
C_C        NaN        NaN   0.000000
D_A        NaN -36.580126        NaN
D_D        NaN   0.000000        NaN

In my actual application the size of the above lists are in the hundreds so I don't want to create hundreds of series objects and then concatenate them all (unless this is the only way to do it?).

I've read through various posts on SO such as: Adding list with different length as a new column to a dataframe and convert pandas series AND dataframe objects to a numpy array but haven't been able to find an elegant solution to a problem where hundreds of series objects need to be created in order to produce the desired dataframe.

1 Answer 1

1

Not quite different from your approach, but this should be quite faster:

df = pd.DataFrame(dict(zip(list_b[i], list_a[i])) for i in range(len(list_a))).T         

Output:

             0          1          2
A_A   0.000000   0.000000        NaN
A_B -11.351902        NaN        NaN
A_D        NaN  36.580126        NaN
B_A  11.351902        NaN        NaN
B_B   0.000000        NaN   0.000000
B_C        NaN        NaN -41.944082
C_B        NaN        NaN  41.944082
C_C        NaN        NaN   0.000000
D_A        NaN -36.580126        NaN
D_D        NaN   0.000000        NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.