1

I have a list of dictionaries with values that are returned as numpy arrays (and which are often empty).

data=[{'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([ 0.64848222])},
      {'width': array([ 0.62241745])},
      {'width': array([ 0.76892571])},
      {'width': array([ 0.69913647])},
      {'width': array([ 0.7506934])},
      {'width': array([ 0.69087949])},
      {'width': array([ 0.65302866])},
      {'width': array([ 0.67267989])},
      {'width': array([ 0.63862089])}]

I would like to create a DataFame were the values are floats and not of numpy array dtype. Also I'd like to the empty arrays to be converted to NaN values.

I have tried using df=pd.DataFrame(data, dtype=float) which returns a DataFame with values as np.arrays as such:

               width
0                 []
1                 []
2                 []
3                 []
4                 []
5   [0.648482224582]
6   [0.622417447245]
7   [0.768925710479]
8   [0.699136467373]
9    [0.75069339816]
10  [0.690879488242]
11  [0.653028655088]
12  [0.672679885077]
13  [0.638620890633]

I've also tried recasting the df's values after creating it using df.values.astype(float) but get the following error: ValueError: setting an array element with a sequence.

The final output I am trying to get for the DataFame looks like:

               width
0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
5     0.648482224582
6     0.622417447245
7     0.768925710479
8     0.699136467373
9      0.75069339816
10    0.690879488242
11    0.653028655088
12    0.672679885077
13    0.638620890633

3 Answers 3

1

After you've constructed the DataFrame from data, the only extra thing you need to do is:

df.width = df.width.str[0]

This works because we're just using the .str accessor to get the first element of each list. Empty lists don't have a first element so NaN is returned for those rows.

You end up with a column of float64 values:

       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Note: if you want to display more decimal places, you'll need to adjust the float precision using pd.set_options.

Alternatively, you can process the list before you construct the DataFrame:

pd.DataFrame([x.get('width') for x in data], columns=['width'])
Sign up to request clarification or add additional context in comments.

2 Comments

I like your alternative implementation using x.get('width')
Thanks! I see it's essentially the same as your method. I thought there might be a built-in Pandas way to do this (e.g. using DataFrame.from_records or similar) but I can't seem to find it...
1

You can use a list comprehension to extract the data from the array in the dictionary. d['width'][0] will extract the first value from the array. if d['width'].shape[0] will evaluate to False if the array is empty, in which case None is inserted.

>>> pd.DataFrame([d['width'][0] if d['width'].shape[0] else None for d in data], 
                 columns=['width'])
       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Comments

0

Try this after getting the dataframe you posted:

def convert(x):
    if len(x) == 0:
            return np.nan
    else:
        return x[0]

 df['width'] = df['width'].apply(lambda x: convert(x))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.