1

I have a .csv file with GPS data which looks like this:

ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757

The last row has a blank or "null" value. I would like to read the data into a dataframe and set the null value to -1 and also read the data in as type float. With my code the data type is set to string and the null value is not substituted.

How I'm trying to do it (wrong):

data = r'c:\temp\gps.csv'

def conv(val):
    if val == np.nan:
        return -1
    return val

df = pd.read_csv(data,converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv},dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})

Code to test output:

lats = df['GPS_LATITUDE'].tolist()
for l in lats:    
    print(l,type(l))
df

Output:

35.66727683 <class 'str'>
35.66727683 <class 'str'>
-1 <class 'str'>
35.66750697 <class 'str'>
 <class 'str'>
Out[63]:
ID  GPS_LATITUDE    GPS_LONGITUDE
0   1   35.66727683 139.7591279
1   2   35.66727683 139.7591279
2   3   -1  -1
3   4   35.66750697 139.7589757
4   5       139.7589757

1 Answer 1

1

First of all, you don't even need to use any conv function:

$ cat /tmp/a.csv
ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757

In [15]: df = pd.read_csv("/tmp/a.csv", dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})

In [16]: df
Out[16]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5           NaN     139.758976

In [18]: df.dtypes
Out[18]: 
ID                 int64
GPS_LATITUDE     float64
GPS_LONGITUDE    float64
dtype: object

In [19]: df.fillna(-1, inplace = True)

In [20]: df
Out[20]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5     -1.000000     139.758976

Second, if you do want to use conv, change it to (also, if you are using conv for all columns, then no need to specify dtype):

In [21]: def conv(val):
   ....:     if not val:
   ....:         return -1
   ....:     return np.float64(val)
   ....: 

In [22]: df = pd.read_csv("/tmp/a.csv", converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv})

In [23]: df
Out[23]: 
   ID  GPS_LATITUDE  GPS_LONGITUDE
0   1     35.667277     139.759128
1   2     35.667277     139.759128
2   3     -1.000000      -1.000000
3   4     35.667507     139.758976
4   5     -1.000000     139.758976

In [24]: df.dtypes
Out[24]: 
ID                 int64
GPS_LATITUDE     float64
GPS_LONGITUDE    float64
dtype: object

In either case:

In [26]: lats = df['GPS_LATITUDE'].tolist()

In [27]: for l in lats:
   ....:     print(l,type(l))
   ....:     
(35.667276829999999, <type 'numpy.float64'>)
(35.667276829999999, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)
(35.667506969999998, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.