Calculating distance between rows of Pandas dataframe and adding to list

Ask Question

Asked 1 year, 6 months ago

Modified 1 year, 6 months ago

Viewed 154 times

The question I'm asking is similar to the one I posted here a while ago: Comparing 2 Pandas dataframes row by row and performing a calculation on each row

I got a very helpful answer to that question and I'm trying to use that information to help me answer my current question.

Task: Group a dataframe by columns trial, RECORDING_SESSION_LABEL, and IP_INDEX. For each group, I need to calculate the Euclidean distance between a row and all rows above it (so from Row 2 to Row n) using the values in columns CURRENT_FIX_X and CURRENT_FIX_Y. If the distance is less than 58.93, I need to add the value of CURRENT_FIX_INDEX from the row I'm comparing to (not against) to a list, and then concatenate that list into a string and add it to a new column (refix_list) so the string is in the new column of the row I'm comparing against.

Example: I'm on Row 7, so I'm comparing the distance of Row 7 to Rows 6, 5, 4, 3, 2, and 1 of that group. If the distance between Row 7 and Rows 5, 3, and 1 are less than 58.93, I want a comma-separated string that contains the CURRENT_FIX_INDEX value of each of those 3 rows in the refix_list column at Row 7.

Problem: I have code that I'm working with, and I'm not sure if it's working because I get a 'ValueError: Length of values (0) does not match length of index (297)' when I try to print the df so I know there's an issue either creating the list or more likely, concatenating it into a string and assigning it to the specific row.

Here's the code I'm working with (with sample data for 1 participant):

import pandas as pd
import numpy as np

data_df = {
    'IP_INDEX': [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4],
    'RECORDING_SESSION_LABEL': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'],
    'trial': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
    'CURRENT_FIX_INDEX': [1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 1, 1, 2, 1, 1, 2, 3, 1, 2, 3, 4],
    'CURRENT_FIX_X': [550, 575, 250, 300, 500, 475, 275, 550, 675, 650, 800, 325, 450, 400, 375, 650, 700, 675, 825, 400, 375, 150],
    'CURRENT_FIX_Y': [275, 250, 600, 650, 300, 325, 675, 300, 850, 875, 250, 625, 225, 150, 675, 250, 300, 275, 150, 225, 250, 650]

}

# Create DF1
df = pd.DataFrame(data_df)

# Define a function to calculate Euclidean distance
def euclidean_distance(x1, y1, x2, y2):
    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)

# Grouping the DataFrame by RECORDING_SESSION_LABEL, trial, and IP_INDEX
grouped = df.groupby(['RECORDING_SESSION_LABEL', 'trial', 'IP_INDEX'])

# List to store CURRENT_FIX_INDEX for each row
index_list = []
refix_values = []

# Iterate over each group
for group_name, group_df in grouped:
    # Sort the group_df by some unique column
    group_df = group_df.sort_values(by='trial')
    
    # Calculate Euclidean distance for each row
    for i, row in group_df.iterrows():
        current_x = row['CURRENT_FIX_X']
        current_y = row['CURRENT_FIX_Y']
        
        # Calculate distance with every row above it
        for j, prev_row in group_df.iloc[:i].iterrows():
            current_index = prev_row['CURRENT_FIX_INDEX']
            prev_x = prev_row['CURRENT_FIX_X']
            prev_y = prev_row['CURRENT_FIX_Y']
            
            distance = euclidean_distance(current_x, current_y, prev_x, prev_y)
            
            # If distance is less than or equal to 58.93, store CURRENT_FIX_INDEX
            if distance <= 58.93:
                index_list.append(current_index)
    refix_values.append(','.join(map(str, index_list))) #Add list of matching INDEX values to list of lists

df['refix_list'] = []

# Iterate over the DataFrame to access each row and its index
for index, row in df.iterrows():
    # Assign the list to the current row in the specified column
    df.at[index, refix_list] = refix_values

print(df)

Expected Output:

IP_INDEX	RECORDING_SESSION_LABEL	trial	CURRENT_FIX_INDEX	CURRENT_FIX_X	CURRENT_FIX_Y	refix_list
1	a	1	1	550	275
1	a	1	2	575	250	1
1	a	1	3	250	600
2	a	1	1	300	650
2	a	1	2	500	300
2	a	1	3	500	325	2
2	a	1	4	275	675	1
2	a	1	5	550	300	3, 2
3	a	1	1	675	850
3	a	1	2	650	875	1
3	a	1	3	800	250
4	a	1	1	325	625
1	a	2	1	450	225
1	a	2	2	400	150
2	a	2	1	375	675
3	a	2	1	650	250
3	a	2	2	700	300
3	a	2	3	675	275	2, 1
4	a	2	1	825	150
4	a	2	2	400	225
4	a	2	3	375	250	2
4	a	2	4	150	650

From my limited knowledge, I'm guessing the issue is in the last block of code, but I'm not positive. Any help is appreciated!

edited May 11, 2024 at 0:35

asked May 10, 2024 at 15:39

Eslifkin

274 bronze badges

you should provide data of what your input/output data is supposed to look like

iBeMeltin
– iBeMeltin

2024-05-10 18:09:39 +00:00
Commented May 10, 2024 at 18:09
Are you entirely sure of roe 8? ````2 a 1 5 550 300 3, 2``` ?

Serge de Gosson de Varennes
– Serge de Gosson de Varennes

2024-05-11 19:33:12 +00:00
Commented May 11, 2024 at 19:33
@SergedeGossondeVarennes yes. The distance between row 8 and rows 5&6 is less than the specified amount. Since I need the data to be grouped by trial, recording_session_label, and ip_index, the calculation for row 8 would stop at row 4. Thus, it wouldn't check if row 8 matched with any rows above that.

Eslifkin
– Eslifkin

2024-05-12 14:50:53 +00:00
Commented May 12, 2024 at 14:50

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Calculating distance between rows of Pandas dataframe and adding to list

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked