Pandas series to numpy array conversion error

Question

I have a pandas series with foll. value_counts output():

NaN     2741
 197    1891
 127     188
 194      42
 195      24
 122      21

When I perform describe() on this series, I get:

df[col_name].describe()
count    2738.000000
mean      172.182250
std        47.387496
min         0.000000
25%       171.250000
50%       197.000000
75%       197.000000
max       197.000000
Name: SS_D_1, dtype: float64

However, if I try to find minimum and maximum, I get nan as answer:

numpy.min(df[col_name].values)
nan

Also, when I try t convert it to a numpy array, I seem to get an array with only nan's

numpy.array(df[col_name])

Any suggestion on how to convert from pandas series to numpy array succesfully

df[col_name].values will return the numpy array. If you have a NaN in the data, it gets propagated using the numpy.min function. Meaning if there is a NaN, np.min will always yield the NaN as the anser. Try nanmin docs.scipy.org/doc/numpy/reference/generated/… — Brian Pendleton
– Brian Pendleton, Commented Sep 4, 2015 at 20:45
The min of any array containing nan is also nan. To ignore nan values, try np.nanmin(df[col_name].values) (or just df[col_name].min()). — ali_m
– ali_m, Commented Sep 4, 2015 at 20:47
Thanks, but I also get a nan for this: numpy.array(df[col_name]).min() — user308827
– user308827, Commented Sep 4, 2015 at 20:50
The problem is that you're casting it to a numpy array before calling the min() method. pandas.Series.min() does the equivalent of np.nanmin and ignores nan values, whereas numpy.ndarray.min does the equivalent of np.min and will return nan for an array that contains one or more nans. — ali_m
– ali_m, Commented Sep 4, 2015 at 21:01
@user308827 - as of pandas' 0.24.0 - you can access the backing array of a pandas Series with .array and .to_numpy - please find an updated answer bellow. pandas 0.24.x release notes — mork
– mork, Commented Jan 25, 2019 at 19:14

ali_m · Accepted Answer · 2015-09-04 21:11:27Z

2

Both the function np.min and the method np.ndarray.min will always return NaN for any array that contains one or more NaN values (this is standard IEE754 floating point behaviour).

You could use np.nanmin, which ignores NaN values when computing the min, e.g.:

np.nanmin(df[col_name].values)

An even simpler option is just to use the pd.Series.min() method, which already ignores NaN values, i.e.:

df[col_name].min()

I have no idea why numpy.array(df[col_name]) would return an array containing only NaNs, unless df[col_name] already contained only NaNs to begin with. I assume this must be due to some other mistake in your code.

answered Sep 4, 2015 at 21:11

ali_m

74.6k28 gold badges230 silver badges314 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mork · Accepted Answer · 2019-01-25 19:11:14Z

1

As of pandas' v 0.24.0 - you can access the backing array of a pandas Series with .array and .to_numpy

pandas 0.24.x release notes Quote: "Series.array and Index.array have been added for extracting the array backing a Series or Index... We haven’t removed or deprecated Series.values or DataFrame.values, but we highly recommend and using .array or .to_numpy() instead

... We recommend using Series.array when you need the array of data stored in the Series, and Series.to_numpy() when you know you need a NumPy array."

answered Jan 25, 2019 at 19:11

mork

1,87324 silver badges24 bronze badges

Collectives™ on Stack Overflow

Pandas series to numpy array conversion error

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related