Numpy array with coordinates

Question

I have a DataFrame with a column with different coordinates, clustered together in other lists, like this:

    name    OBJECTID    geometry
0    NaN           1    ['-80.304852,-3.489302,0.0','-80.303087,-3.490214,0.0',...]

1    NaN           2    ['-80.27494,-3.496571,0.0',...]

2    NaN           3    ['-80.267987,-3.500003,0.0',...]

I want to separate the values and remove the '0.0', but keep them inside the lists to add them to a certain key in a dictionary, that looks like this:

    name    OBJECTID    geometry
0    NaN           1    [[-80.304852, -3.489302],[-80.303087, -3.490214],...]

1    NaN           2    [[-80.27494, -3.496571],...]

2    NaN           3    [[-80.267987, -3.500003],...]

This is my code that didn't work where I tried to separate them in a for loop:

import panda as pd
import numpy as np

r = pd.read_csv('data.csv') 
rloc = np.asarray(r['geometry'])

r['latitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)
r['longitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)

# Separating the latitude and longitude values form each string.
for i in range(0, len(rloc)):
    for j in range(0, len(rloc[i])):
        coord = rloc[i][j].split(',')
        r['longitude'] = coord[0]
        r['latitude'] = coord[1]

r = r[['OBJECTID', 'latitude', 'longitude', 'name']]

Edit: The result wasn't good because it printed out only one value for each one.

  OBJECTID  latitude    longitude   name
0        1  -3.465566   -80.151633  NaN
1        2  -3.465566   -80.151633  NaN
2        3  -3.465566   -80.151633  NaN

Bonus question: How cand I add all of these longitude and latitude values inside a tuple to use with geopy? Like this:

r['location'] = (r['latitude], r['longitude'])

So, instead, the geometry column would look like this:

geometry
[(-80.304852, -3.489302),(-80.303087, -3.490214),...]

[(-80.27494, -3.496571),...]

[(-80.267987, -3.500003),...]

Edit:

The data looked like this at first(for each row):

<LineString><coordinates>-80.304852,-3.489302,0.0 -80.303087,-3.490214,0.0 ...</coordinates></LineString>

I modified it with regex, using this code:

geo = np.asarray(r['geometry']); 
geo = [re.sub(re.compile('<.*?>'), '', string) for string in geo]

And then I placed it in an array:

rv = [geo[i].split() for i in range(0,len(geo))]
r['geometry'] = np.asarray(rv)

When I call r['geometry'], the output is:

0    [-80.304852,-3.489302,0.0, -80.303087,-3.49021...
1    [-80.27494,-3.496571,0.0, -80.271963,-3.49266,...
2    [-80.267987,-3.500003,0.0, -80.267845,-3.49789...
Name: geometry, dtype: object

And r['geometry'][0] is:

 ['-80.304852,-3.489302,0.0',
 '-80.303087,-3.490214,0.0',
 '-80.302131,-3.491878,0.0',
 '-80.300763,-3.49213,0.0']

Updated with the result! It doesn't work because the list that's inside is removed... I'm trying to find a way around that. — Takusui
– Takusui, Commented Feb 11, 2018 at 19:27

Mr. T · Accepted Answer · 2018-02-12 16:50:45Z

2

A pandas solution with input from a toy data set:

df = pd.read_csv("test.txt")
   name  OBJECTID                                           geometry
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...

Now the transformation into columns of longitude-latitude pairs:

#regex extraction of longitude latitude pairs
pairs = "(-?\d+.\d+,-?\d+.\d+)"
s = df["geometry"].str.extractall(pairs)
#splitting string into two parts, creating two columns for longitude latitude
s = s[0].str.split(",", expand = True)  
#converting strings into float numbers - is this even necessary?
s[[0, 1]] = s[[0, 1]].apply(pd.to_numeric)
#creating a tuple from longitude/latitude columns
s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as columns in original dataframe 
df = pd.concat([df, s["lat_long"].unstack(level = -1)], axis = 1)

Output from the toy data set:

   name  OBJECTID                                           geometry  \
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...   
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...   
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...   

               0              1              2  
0  (-80.3, -3.4)  (-80.3, -3.9)  (-80.3, -3.9)  
1   (80.2, -4.4)   (-81.3, 2.9)  (-80.7, -3.2)  
2  (-80.1, -3.2)  (-80.8, -2.9)  (-80.1, -1.9)

Alternatively, you can combine the tuples in one column as a list:

s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as a list into a column of the original dataframe 
df["lat_long"] = s.groupby(level=[0])["lat_long"].apply(list)

Output now:

   name  OBJECTID                                           geometry  \
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...   
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...   
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...   

                                        lat_long  
0  [(-80.3, -3.4), (-80.3, -3.9), (-80.3, -3.9)]  
1    [(80.2, -4.4), (-81.3, 2.9), (-80.7, -3.2)]  
2  [(-80.1, -3.2), (-80.8, -2.9), (-80.1, -1.9)]

edited Feb 12, 2018 at 16:50

answered Feb 11, 2018 at 22:32

Mr. T

12.5k10 gold badges39 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Takusui Over a year ago

It seems like the list 's' is empty. The line 's = df["geometry"].str.extractall(pairs)' is not doing anything and if I try to print s out, I only get an empty dataframe with 0 as the column name.

Mr. T Over a year ago

You always should provide a Minimal, Complete and Verifiable example. I worked with the information given here, but seemingly, the field geometry differs from your description. Can you upload a sample file and describe how you create the dataframe, so I can adapt the script?

Takusui Over a year ago

Updated post with new example.

Takusui Over a year ago

Is it possible that df['geometry'].str.extractall(pairs) doesn't work because the dtype is 'object'?

Mr. T Over a year ago

I updated the code, it works now with unequal lengths, too. The solution comes from Wen, please give him an upvote for his contribution.

|

ascripter · Accepted Answer · 2018-02-11 21:47:31Z

1

In your code, you are effectively assigning the longitude and latitude values of last iteration to the complete columns. You may also convert string to float:

# Separating the latitude and longitude values form each string.
for i in range(0, len(rloc)):
    r['longitude'][i] = []
    r['latitude'][i] = []
    for j in range(0, len(rloc[i])):
        coord = rloc[i][j].split(',')
        r['longitude'][i].append(float(coord[0]))
        r['latitude'][i].append(float(coord[1]))

Going for the bonus :)

for i in range(0, len(rloc)):
    r['geometry'][i] = [
        (
            float(element.split(',')[0]),
            float(element.split(',')[1])
        ) for element in r['geometry'][i]
    ]

answered Feb 11, 2018 at 21:47

ascripter

6,31512 gold badges54 silver badges74 bronze badges

1 Comment

Takusui Over a year ago

Thanks! This works! I chose the other one as the answer because it's much more computationally efficient as I have over one million entries to porcess.

Collectives™ on Stack Overflow

Numpy array with coordinates

2 Answers 2

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related