0

Title is a little convoluted, but hopefully this will help. I want to retrieve the value of values when variableA == variableB == variableB of the current row. For example, for the first row, result will be 54 because the only time those conditions are met are in row 3. However, if variableA == variableB in the current row, the result will be 0. Example Data:

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             3
  4  16        2             7
  5  37        6             6

Desired Result:

    values    variableA  variableB  result
  0  134       1             3      54
  1  12        2             6      37
  2  43        1             2      16
  3  54        3             3      0
  4  16        2             7      NaN
  5  37        6             6      0

Not taking into consideration the 0 result when variableA and variableB match in the current row, my attempt:

vars = df[['variableA', 'variableB']].values
doublematch = (vars[:, None] == vars[None, :] == vars[:, [0]]).all(-1)
df['result'] = df['values'].values @ doublematch #python3

but that clearly didn't work. Thanks!

2
  • Is there always a one to one match further up the column? For example when variableB is 2 in the third row there are two 2's but only one further ahead. What happens if there are more than one 2 further ahead? Does it ever happen? Commented Jan 22, 2017 at 5:06
  • If I understand your question correctly, yes, each variable for variableB and variableA occur more than twice. However, there should only be one instance where a row has the same variable under both columns (as in, variableA == variableB for that row). The columns were generated from a frozenset tuple representing every unique tuple from a list N elements long. Commented Jan 22, 2017 at 21:45

1 Answer 1

1

Your example data is inconsistent as there is no row 5 in the upper dataframe and the bottom dataframe has row with index 4 change variableB to 2. Nonetheless, here is a solution based on using join and then taking the last row of any duplicate matches.

Here is the data I am using - I added an extra row from your result dataframe.

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             3
  4  16        2             7
  5  37        6             6 


s = df[['variableA', 'values']].set_index('variableA').squeeze()
s.rename('result', inplace=True)

df_final = df.join(s, on='variableB')

df_final.loc[df_final['variableA'] == df_final['variableB'], 'result'] = 0
df_final = df_final.reset_index().drop_duplicates('index', keep='last').set_index('index')

       values  variableA  variableB  result
index                                      
0         134          1          3    54.0
1          12          2          6    37.0
2          43          1          2    16.0
3          54          3          3     0.0
4          16          2          7     NaN
5          37          6          6     0.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.