4

In the dataframe below:

 T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - 

How do I replace all the - with NaN's. I do not want to specify column names since I do not know before hand which column will have -

1
  • That looks like the display value for nan Commented Feb 11, 2017 at 4:25

4 Answers 4

5

If those are strings, then your floats are probably also strings.

Assuming your dataframe is df, I'd try

pd.to_numeric(df.stack(), 'coerce').unstack()

Deeper explanation

Pandas doesn't usually represent missing floats with '-'. Therefore, that '-' must be a string. Thus, the dtype of any column with a '-' in it, must be 'object'. That makes it highly likely that whatever parsed the data, left the floats as string.

setup

from io import StringIO
import pandas as pd

txt = """T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - """

df = pd.read_csv(StringIO(txt), delim_whitespace=True)
print(df)

   T2MN   T2MX   RH2M  DFP2M RAIN
0  6.96   9.32  84.27   5.57    -
1  6.31  10.46      -   5.63    -
2     -  10.66  79.38   3.63    -
3  0.79   4.45  94.24   1.85    -
4  1.45   3.99  91.71   1.17    -

What are the dtypes?

print(df.dtypes)

T2MN      object
T2MX     float64
RH2M      object
DFP2M    float64
RAIN      object
dtype: object

What is the type of the first element?

print(type(df.iloc[0, 0]))

<class 'str'>

This means that any column with a '-' is like a column of strings that look like floats. You want to use pd.to_numeric with parameter errors='coerce' to force non-numeric items to np.nan. However, pd.to_numeric does not operate on a pd.DataFrame so we stack and unstack.

pd.to_numeric(df.stack(), 'coerce').unstack()

   T2MN   T2MX   RH2M  DFP2M  RAIN
0  6.96   9.32  84.27   5.57   NaN
1  6.31  10.46    NaN   5.63   NaN
2   NaN  10.66  79.38   3.63   NaN
3  0.79   4.45  94.24   1.85   NaN
4  1.45   3.99  91.71   1.17   NaN
Sign up to request clarification or add additional context in comments.

Comments

5

Just replace() the string:

In [10]: df.replace('-', 'NaN')
Out[10]: 
   T2MN   T2MX   RH2M  DFP2M RAIN
0  6.96   9.32  84.27   5.57  NaN
1  6.31  10.46    NaN   5.63  NaN
2   NaN  10.66  79.38   3.63  NaN
3  0.79   4.45  94.24   1.85  NaN
4  1.45   3.99  91.71   1.17  NaN

Comments

2

I think you want the actual numpy.nan instead of a string NaN as you can use a lot of methods such as fillna/isnull/notnull on the pandas.Series/pandas.DataFrame

import pandas as pd

import numpy as np

df = pd.DataFrame([['-']*10]*10)

df = df.replace('-',np.nan)

Comments

1

It looks like you were reading this data from CSV/FWF file... If it's true the easiest way to get rid of '-' would be to explain Pandas that it's NaN's representation:

df = pd.read_csv(filename, na_values=['NaN', 'nan', '-'])

Test:

In [79]: df
Out[79]:
   T2MN   T2MX   RH2M  DFP2M  RAIN
0  6.96   9.32  84.27   5.57   NaN
1  6.31  10.46    NaN   5.63   NaN
2   NaN  10.66  79.38   3.63   NaN
3  0.79   4.45  94.24   1.85   NaN
4  1.45   3.99  91.71   1.17   NaN

In [80]: df.dtypes
Out[80]:
T2MN     float64
T2MX     float64
RH2M     float64
DFP2M    float64
RAIN     float64
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.