Add a column to Pandas DataFrame with multiple lookups based on other columns

Question

I am trying to add a column (col5) to the DataFrame such as below where the value in col5 needs to be a value from col4 that satisfies certain conditions in another column at the same row. For example, at row 1 in col5, I wanted a value from col4 such that col1 and col2 has a value the same as row 1 but the value in col3 is != value in row1. In excel this can be done using sumifs as shown in the image. Any help is appreciated. Image shown in the link below I have updated my question based on the answer from Paul.

df=pd.DataFrame({"col1":[1,1,1,1,2,2,2,2], "col2":['a','a','b', 'b','c', 'c', 'd', 'd'], "col3":['p','q','p', 'q', 'p','q','p', 'q'], 'col4':[100,200,300,400,500,600,700,800]})

What I want to accomplish is something like below to add a col5 which checks conditions on other columns where col1 and col2 should be the same but col3 should not match. Assuming that col3 will have only two different values so saying col3 to not match means col3 should have another value.

df2 = df

df['col5'] = df[(df.col1 == df2.col1) & (df.col2 == df2.col2) & (df.col3 != df2.col3)].col4

df

>>>

  col1 col2 col3 col4   col5
0   1   a    p   100    NaN
1   1   a    q   200    NaN
2   1   b    p   300    NaN
3   1   b    q   400    NaN
4   2   c    p   500    NaN
5   2   c    q   600    NaN
6   2   d    p   700    NaN
7   2   d    q   800    NaN

When I run this I get all NaN in col5 as shown above.

What I want to get is as below. Here the arrangement seems to make is simple like getting from the next or previous row but in the extended data, it can be at any row.

>>>

  col1 col2 col3 col4   col5
0   1   a    p   100    200
1   1   a    q   200    100
2   1   b    p   300    400
3   1   b    q   400    300
4   2   c    p   500    600
5   2   c    q   600    500
6   2   d    p   700    800
7   2   d    q   800    700

kindly share sample, reproducible data, with expected output dataframe — sammywemmy
– sammywemmy, Commented Dec 14, 2021 at 8:08
I shared a possible answer, not sure if it is the answer you are looking for. Please add an expected output dataframe in your question, as @sammywemmy suggested. — Paul
– Paul, Commented Dec 14, 2021 at 12:26
Sorry that I did not attach the data in reproducible format. My bad. Thanks a lot Paul for doing that. — sri
– sri, Commented Dec 14, 2021 at 16:23

Paul · Accepted Answer · 2021-12-15 08:09:13Z

1

Your question is not entirely clear to me, but what I understood was: check if col1 and col2 are the same as the next row, but col3 is different.

If so: grab the value of col4 of the next row as col5.

df=pd.DataFrame({"col1":[1,1,1,1,2,2,2,2], "col2":['a','a','b', 'b','c', 'c', 'd', 'd'], "col3":['p','q','p', 'q', 'p','q','p', 'q'], 'col4':[100,200,300,400,500,600,700,800]})

df2 = df.shift(-1)
df['col5'] = df2[(df.col1 == df2.col1) & (df.col2 == df2.col2) & (df.col3 != df2.col3)].col4

df

        col1    col2    col3    col4    col5
0       1       a       p       100     200.0
1       1       a       q       200     NaN
2       1       b       p       300     400.0
3       1       b       q       400     NaN
4       2       c       p       500     600.0
5       2       c       q       600     NaN
6       2       d       p       700     800.0
7       2       d       q       800     NaN

Update

If you also want the other values to be found, use apply:

df['col5'] = df.apply(
    lambda x: df[
        (df.col1 == x.col1) & 
        (df.col2 == x.col2) & 
        (df.col3 != x.col3)
        ].reset_index()['col4'],
    axis=1)

This is a better way to iterate over you rows.

edited Dec 15, 2021 at 8:09

answered Dec 14, 2021 at 8:55

Paul

1,8971 gold badge17 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Paul Over a year ago

@sri If my answer is useful/helpful, don't hesitate to upvote it.

sri Over a year ago

Thanks a lot Paul for putting together the data in the correct form. I can see that my question was not very clear but the solution you have suggested gave me the idea probably that I was missing. I was not getting to how I can do something like df['col1'] = df['col1'] or df['col3'] != df['col3']. But from what I see in your answer I thought I can at least try something and explain the question in a better way. I have updated my question based on this.

Paul Over a year ago

@sri I updated my answer, it now suits your desired output.

sri Over a year ago

Thanks this is much efficient way than my iterations.

sri · Accepted Answer · 2021-12-15 03:26:32Z

1

I think I figured out how to do this. I am currently iterating over each row of the dataframe to get the job done.

for i in range(len(df)):
  df.loc[i,'col5'] = df[(df.col1 == df.loc[i,'col1']) & (df.col2 == 
  df.loc[i,'col2']) & (df.col3 != df.loc[i,'col3'])].col4.sum()

df
>>>
 col1 col2 col3 col4 col5
0   1   a   p   100  200.0
1   1   a   q   200  100.0
2   1   b   p   300  400.0
3   1   b   q   400  300.0
4   2   c   p   500  600.0
5   2   c   q   600  500.0
6   2   d   p   700  800.0
7   2   d   q   800  700.0

I would be glad to know if there is a better and efficient way to do this without iterating. Thanks!

answered Dec 15, 2021 at 3:26

sri

436 bronze badges

Collectives™ on Stack Overflow

Add a column to Pandas DataFrame with multiple lookups based on other columns

2 Answers 2

Update

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related