create pandas dataframe from different size numpy arrays

Question

I have the following numpy arrays which are of different shape. I want to use pandas to create a dataframe so that I can display it neatly as shown below:

numpy arrays:

et_arr:  [  8.94668401e+01   1.66449935e+01  -4.44089210e-14]
ea_arr:  [ 100.           21.84087363    1.04031209]
it: 
[[ 0.1728      1.0688      1.4848      1.6008    ]
 [ 1.36746667  1.62346667  1.63946667  0.        ]
 [ 1.64053333  1.64053333  0.          0.        ]
 [ 1.64053333  0.          0.          0.        ]]

resulting dataframe:

One way is to loop around among all 3 arrays and collect based on the index. I have tried numpy.column_stack and zip and map to some extent but to not the desired result.

I always have used pandas dataframe to display results and it was easy. This one seems a little tricky. How can I achieve this.

What information do you have in your data that tells you it's the first entry (column 1) in ea_arr that is missing? — andrew_reece
– andrew_reece, Commented Dec 23, 2017 at 18:35
Its by default 100% . ea stands for error approximation. Therefore as you can see my ea_arr[0] is 100. — John Honai
– John Honai, Commented Dec 23, 2017 at 19:05

andrew_reece · Accepted Answer · 2017-12-23 18:27:43Z

2

If you have put the arrays into a dict data, you can loop over keys and add as you go:

data = {"et_arr":[8.94668401e+01,1.66449935e+01,-4.44089210e-14],
        "ea_arr":[100.,21.84087363,1.04031209],
        "it":[[0.1728,1.0688,1.4848,1.6008],
              [1.36746667,1.62346667,1.63946667,0.],
              [1.64053333,1.64053333,0.,0.],
              [1.64053333,0.,0.,0.]]}

# To keep track of the order of dict indices we'll capture them as we loop:
indices = []
df = pd.DataFrame()

for k in data.keys():
    df = pd.concat([df, pd.DataFrame(data[k]).T], ignore_index=True).fillna(0)
    if k == "it":
        indices.extend([f"n={i+1}" for i in range(len(data[k]))])
    else:
        indices.append(k)

df.index = indices
df.columns = df.columns + 1

df
                1          2             3         4
et_arr   89.46684  16.644994 -4.440892e-14  0.000000
ea_arr  100.00000  21.840874  1.040312e+00  0.000000
n=1       0.17280   1.367467  1.640533e+00  1.640533
n=2       1.06880   1.623467  1.640533e+00  0.000000
n=3       1.48480   1.639467  0.000000e+00  0.000000
n=4       1.60080   0.000000  0.000000e+00  0.000000

Alternately, you can mash it all together by hand, but that's less scalable:

df = pd.DataFrame(it)
arr_df = pd.DataFrame([et_arr,ea_arr])
df = pd.concat([df, arr_df], ignore_index=True).fillna(0)
df.columns = range(1,5)
df.columns.name = "iter"
df.index = ["n=1","n=2","n=3","n=4","et","ea"]

edited Dec 23, 2017 at 18:27

answered Dec 23, 2017 at 18:04

andrew_reece

21.4k3 gold badges40 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

John Honai Over a year ago

is there a way to not display anything at all instead of fillna(0). It will give the wrong impression that et and ea and others are 0.0%. Whereas, in actuality they were not necessary to be calculated.

andrew_reece Over a year ago

You can use fillna(''), but note that this will convert any column with NaN values to type object (as '' is a string and not a number).

Collectives™ on Stack Overflow

create pandas dataframe from different size numpy arrays

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related