0

Using the following data in train_data_sample and the below code, how can I iterate through each index latitude and longitude? (see below for wished results)

   latitude longitude price
0   55.6632 12.6288 2595000
1   55.6637 12.6291 2850000
2   55.6637 12.6291 2850000
3   55.6632 12.6290 3198000
4   55.6632 12.6290 2995000
5   55.6638 12.6294 2395000
6   55.6637 12.6291 2995000
7   55.6642 12.6285 4495000
8   55.6632 12.6285 3998000
9   55.6638 12.6294 3975000
from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
   
    for index in train_data_sample.index:
        lon1 = train_data_sample["longitude"].loc[train_data_sample.index==index]
        lat1 = train_data_sample["latitude"].loc[train_data_sample.index==index]
        lon2 = row['longitude']
        lat2 = row['latitude']
        lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
        dlon = lon2 - lon1 
        dlat = lat2 - lat1 
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * arcsin(sqrt(a)) 
        km = 6367 * c
    return km

def insert_dist(df):
    df["distance"+str(index)] = df.apply(lambda row: haversine(row), axis=1)
    return df

print(insert_dist(train_data_sample))

This is the result for index 0. It looks at the coordinates for index 0 versus every other row and returns the distance in meters. So the distance between coordinates for index 0 and 1 are ~50 meters.

latitude    longitude   price   distance0
0   55.6632 12.6288 2595000    0.000000
1   55.6637 12.6291 2850000    0.058658
2   55.6637 12.6291 2850000    0.058658
3   55.6632 12.6290 3198000    0.012536
4   55.6632 12.6290 2995000    0.012536
5   55.6638 12.6294 2395000    0.076550
6   55.6637 12.6291 2995000    0.058658
7   55.6642 12.6285 4495000    0.112705
8   55.6632 12.6285 3998000    0.018804
9   55.6638 12.6294 3975000    0.076550

The end result should return not only distance0, but also distance1, distance2, etc.

2 Answers 2

1

It seems like your making things a bit more complicating than necessary. By nesting a for loop in another for loop you can achieve what you want in a more straightforward way.

from numpy import cos, sin, arcsin, sqrt
from math import radians
import pandas as pd
import numpy as np


# recreate your dataframe
data = [[55.6632, 12.6288, 2595000],
        [55.6637, 12.6291, 2850000],
        [55.6637, 12.6291, 2850000], 
        [55.6632, 12.6290, 3198000]]

data = np.array(data)

train_data_sample = pd.DataFrame(data, columns = ["latitude", "longitude", "price"])


# copied  "distance calculating" code here
def GetDistance(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * arcsin(sqrt(a)) 
    km = 6367 * c
    return km

# loop over every row with iterrows
for index, row in train_data_sample.iterrows():
    
    distances = []
    
    lat1, lon1 = row[["longitude", "longitude"]]
    
    # loop again over every row with iterrows
    for index_2, row_2 in train_data_sample.iterrows():
        lat2, lon2 = row_2[["longitude", "longitude"]]
        # get the distance
        distances.append( GetDistance(lon1, lat1, lon2, lat2) )
        
    # add the column to the dataframe    
    train_data_sample["distance"+str(index)] = distances

Sign up to request clarification or add additional context in comments.

Comments

0

I would not use apply here as it works row wise, but would go for a matrix approach using numpy instead.

First convert all degrees into radians:

df['latitude'] *= np.pi/180
df['longitude'] *= np.pi/180

Then turn the latitude and longitude vectors into matrices by duplicating the vector as many times as the length of the vector. For lat2/lon2 take the transpose.

lat1 = np.tile(df['latitude'].values.reshape([-1,1]),(1,df.shape[0]))
lon1 = np.tile(df['longitude'].values.reshape([-1,1]),(1,df.shape[0]))

lat2 = np.transpose(lat1)
lon2 = np.transpose(lon1)

Now you have 4 matrices, which contain all combinations between the lat/lon pairs on which you can simply apply your function all at once to get all the distances in one go:

dlon = lon2 - lon1 
dlat = lat2 - lat1 
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a)) 
km = 6367 * c

This result can be stitched to your original dataframe:

result = pd.concat([df,pd.DataFrame(km,columns=df.index)],axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.