Creating DataFrame with list of dictionaries with np.array values

Question

I have a list of dictionaries with values that are returned as numpy arrays (and which are often empty).

data=[{'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([ 0.64848222])},
      {'width': array([ 0.62241745])},
      {'width': array([ 0.76892571])},
      {'width': array([ 0.69913647])},
      {'width': array([ 0.7506934])},
      {'width': array([ 0.69087949])},
      {'width': array([ 0.65302866])},
      {'width': array([ 0.67267989])},
      {'width': array([ 0.63862089])}]

I would like to create a DataFame were the values are floats and not of numpy array dtype. Also I'd like to the empty arrays to be converted to NaN values.

I have tried using df=pd.DataFrame(data, dtype=float) which returns a DataFame with values as np.arrays as such:

               width
0                 []
1                 []
2                 []
3                 []
4                 []
5   [0.648482224582]
6   [0.622417447245]
7   [0.768925710479]
8   [0.699136467373]
9    [0.75069339816]
10  [0.690879488242]
11  [0.653028655088]
12  [0.672679885077]
13  [0.638620890633]

I've also tried recasting the df's values after creating it using df.values.astype(float) but get the following error: ValueError: setting an array element with a sequence.

The final output I am trying to get for the DataFame looks like:

               width
0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
5     0.648482224582
6     0.622417447245
7     0.768925710479
8     0.699136467373
9      0.75069339816
10    0.690879488242
11    0.653028655088
12    0.672679885077
13    0.638620890633

Alex Riley · Accepted Answer · 2015-08-13 21:52:56Z

1

After you've constructed the DataFrame from data, the only extra thing you need to do is:

df.width = df.width.str[0]

This works because we're just using the .str accessor to get the first element of each list. Empty lists don't have a first element so NaN is returned for those rows.

You end up with a column of float64 values:

       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Note: if you want to display more decimal places, you'll need to adjust the float precision using pd.set_options.

Alternatively, you can process the list before you construct the DataFrame:

pd.DataFrame([x.get('width') for x in data], columns=['width'])

edited Aug 13, 2015 at 21:52

answered Aug 13, 2015 at 21:04

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alexander Over a year ago

I like your alternative implementation using x.get('width')

Alex Riley Over a year ago

Thanks! I see it's essentially the same as your method. I thought there might be a built-in Pandas way to do this (e.g. using DataFrame.from_records or similar) but I can't seem to find it...

Alexander · Accepted Answer · 2015-08-13 21:26:31Z

1

You can use a list comprehension to extract the data from the array in the dictionary. d['width'][0] will extract the first value from the array. if d['width'].shape[0] will evaluate to False if the array is empty, in which case None is inserted.

>>> pd.DataFrame([d['width'][0] if d['width'].shape[0] else None for d in data], 
                 columns=['width'])
       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

answered Aug 13, 2015 at 21:26

Alexander

111k32 gold badges212 silver badges208 bronze badges

Comments

DeepSpace · Accepted Answer · 2015-08-13 21:04:25Z

0

Try this after getting the dataframe you posted:

def convert(x):
    if len(x) == 0:
            return np.nan
    else:
        return x[0]

 df['width'] = df['width'].apply(lambda x: convert(x))

answered Aug 13, 2015 at 21:04

DeepSpace

82.1k12 gold badges119 silver badges166 bronze badges

Collectives™ on Stack Overflow

Creating DataFrame with list of dictionaries with np.array values

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related