0

I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2 

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?

4
  • But where is the numeric data on which you are performing calculations? I'm guessing it's not Index (an identifier column) or device (a string column). Commented Jun 26, 2018 at 9:32
  • Hello @jpp.. there are over 500 columns and I am iterating over them in a loop and finding these metrics.Some columns are like device and hence for them I just need to add NA to dictionary and simply move on to the next column! Hope that answers your ques. Commented Jun 26, 2018 at 9:34
  • Please supply a minimal reproducible example. In this case, that means show us how you construct your for loop; e.g. what's data_col ? Commented Jun 26, 2018 at 9:38
  • You can call those statistical functions on the whole df (instead of each column individually). Pandas will then compute the values automatically only for numerical columns. link to documentation Commented Jun 26, 2018 at 9:45

3 Answers 3

1

Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

numeric_cols = df.select_dtypes(include=numerics).columns

for col in df:
    if col in numeric_cols:
        # calculate & add some values to dictionary
    else:
        # add NA values to dictionary
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot @jpp.. Your solution works beautifully!
0

Select the column fro dataframe where you want to set empty values to nan.

df[df['col'] == ''] = np.nan

Hope this helps.

2 Comments

If you're going to downvote atleast give a reason so that I can learn and improve whatever it is that is wrong or bad.
It seems like you are answering another question. The output of OP is a dictionary, not a new dataframe column.
0

You could try with a try/except IndexError

try:
   # whatever you do that might rise an IndexError
except IndexError:
   # append NA to dict

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.