Python Pandas: Selecting Value from row when two values in that row match a value farther up the column

Question

Title is a little convoluted, but hopefully this will help. I want to retrieve the value of values when variableA == variableB == variableB of the current row. For example, for the first row, result will be 54 because the only time those conditions are met are in row 3. However, if variableA == variableB in the current row, the result will be 0. Example Data:

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             3
  4  16        2             7
  5  37        6             6

Desired Result:

    values    variableA  variableB  result
  0  134       1             3      54
  1  12        2             6      37
  2  43        1             2      16
  3  54        3             3      0
  4  16        2             7      NaN
  5  37        6             6      0

Not taking into consideration the 0 result when variableA and variableB match in the current row, my attempt:

vars = df[['variableA', 'variableB']].values
doublematch = (vars[:, None] == vars[None, :] == vars[:, [0]]).all(-1)
df['result'] = df['values'].values @ doublematch #python3

but that clearly didn't work. Thanks!

Is there always a one to one match further up the column? For example when variableB is 2 in the third row there are two 2's but only one further ahead. What happens if there are more than one 2 further ahead? Does it ever happen? — Ted Petrou
– Ted Petrou, Commented Jan 22, 2017 at 5:06
If I understand your question correctly, yes, each variable for variableB and variableA occur more than twice. However, there should only be one instance where a row has the same variable under both columns (as in, variableA == variableB for that row). The columns were generated from a frozenset tuple representing every unique tuple from a list N elements long. — Flow Nuwen
– Flow Nuwen, Commented Jan 22, 2017 at 21:45

Ted Petrou · Accepted Answer · 2017-01-22 05:13:28Z

Your example data is inconsistent as there is no row 5 in the upper dataframe and the bottom dataframe has row with index 4 change variableB to 2. Nonetheless, here is a solution based on using join and then taking the last row of any duplicate matches.

Here is the data I am using - I added an extra row from your result dataframe.

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             3
  4  16        2             7
  5  37        6             6 


s = df[['variableA', 'values']].set_index('variableA').squeeze()
s.rename('result', inplace=True)

df_final = df.join(s, on='variableB')

df_final.loc[df_final['variableA'] == df_final['variableB'], 'result'] = 0
df_final = df_final.reset_index().drop_duplicates('index', keep='last').set_index('index')

       values  variableA  variableB  result
index                                      
0         134          1          3    54.0
1          12          2          6    37.0
2          43          1          2    16.0
3          54          3          3     0.0
4          16          2          7     NaN
5          37          6          6     0.0

Collectives™ on Stack Overflow

Python Pandas: Selecting Value from row when two values in that row match a value farther up the column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related