how to replace empty series values with NaN in python

Question

I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?

But where is the numeric data on which you are performing calculations? I'm guessing it's not Index (an identifier column) or device (a string column). — jpp
– jpp, Commented Jun 26, 2018 at 9:32
Hello @jpp.. there are over 500 columns and I am iterating over them in a loop and finding these metrics.Some columns are like device and hence for them I just need to add NA to dictionary and simply move on to the next column! Hope that answers your ques. — Shuvayan Das
– Shuvayan Das, Commented Jun 26, 2018 at 9:34
Please supply a minimal reproducible example. In this case, that means show us how you construct your for loop; e.g. what's data_col ? — jpp
– jpp, Commented Jun 26, 2018 at 9:38
You can call those statistical functions on the whole df (instead of each column individually). Pandas will then compute the values automatically only for numerical columns. link to documentation — swenzel
– swenzel, Commented Jun 26, 2018 at 9:45

jpp · Accepted Answer · 2018-06-26 10:54:44Z

1

Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

numeric_cols = df.select_dtypes(include=numerics).columns

for col in df:
    if col in numeric_cols:
        # calculate & add some values to dictionary
    else:
        # add NA values to dictionary

answered Jun 26, 2018 at 10:54

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shuvayan Das Over a year ago

Thanks a lot @jpp.. Your solution works beautifully!

chirag · Accepted Answer · 2018-06-26 09:38:00Z

0

Select the column fro dataframe where you want to set empty values to nan.

df[df['col'] == ''] = np.nan

Hope this helps.

answered Jun 26, 2018 at 9:38

chirag

1762 silver badges12 bronze badges

2 Comments

chirag Over a year ago

If you're going to downvote atleast give a reason so that I can learn and improve whatever it is that is wrong or bad.

jpp Over a year ago

It seems like you are answering another question. The output of OP is a dictionary, not a new dataframe column.

Dovi · Accepted Answer · 2018-06-26 09:38:48Z

0

You could try with a try/except IndexError

try:
   # whatever you do that might rise an IndexError
except IndexError:
   # append NA to dict

answered Jun 26, 2018 at 9:38

Dovi

8531 gold badge12 silver badges34 bronze badges

Collectives™ on Stack Overflow

how to replace empty series values with NaN in python

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related