Set datatype after converting null values while reading from csv to DataFrame with Pandas

Question

I have a .csv file with GPS data which looks like this:

ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757

The last row has a blank or "null" value. I would like to read the data into a dataframe and set the null value to -1 and also read the data in as type float. With my code the data type is set to string and the null value is not substituted.

How I'm trying to do it (wrong):

data = r'c:\temp\gps.csv'

def conv(val):
    if val == np.nan:
        return -1
    return val

df = pd.read_csv(data,converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv},dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})

Code to test output:

lats = df['GPS_LATITUDE'].tolist()
for l in lats:    
    print(l,type(l))
df

Output:

35.66727683 <class 'str'>
35.66727683 <class 'str'>
-1 <class 'str'>
35.66750697 <class 'str'>
 <class 'str'>
Out[63]:
ID  GPS_LATITUDE    GPS_LONGITUDE
0   1   35.66727683 139.7591279
1   2   35.66727683 139.7591279
2   3   -1  -1
3   4   35.66750697 139.7589757
4   5       139.7589757

Nehal J Wani · Accepted Answer · 2016-09-01 17:59:06Z

First of all, you don't even need to use any conv function:

$ cat /tmp/a.csv
ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757

In [15]: df = pd.read_csv("/tmp/a.csv", dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})

In [16]: df
Out[16]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5           NaN     139.758976

In [18]: df.dtypes
Out[18]: 
ID                 int64
GPS_LATITUDE     float64
GPS_LONGITUDE    float64
dtype: object

In [19]: df.fillna(-1, inplace = True)

In [20]: df
Out[20]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5     -1.000000     139.758976

Second, if you do want to use conv, change it to (also, if you are using conv for all columns, then no need to specify dtype):

In [21]: def conv(val):
   ....:     if not val:
   ....:         return -1
   ....:     return np.float64(val)
   ....: 

In [22]: df = pd.read_csv("/tmp/a.csv", converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv})

In [23]: df
Out[23]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5     -1.000000     139.758976

In [24]: df.dtypes
Out[24]: 
ID                 int64
GPS_LATITUDE     float64
GPS_LONGITUDE    float64
dtype: object

In either case:

In [26]: lats = df['GPS_LATITUDE'].tolist()

In [27]: for l in lats:
   ....:     print(l,type(l))
   ....:     
(35.667276829999999, <type 'numpy.float64'>)
(35.667276829999999, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)
(35.667506969999998, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)

Collectives™ on Stack Overflow

Set datatype after converting null values while reading from csv to DataFrame with Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related