2

I have a pandas data frame that looks like this -

Col1 Col2 INDX
10 20 0
30 40 1
50 60 1
70 80 0

For each row I want to select value from either Col1 or Col2 based on value in INDX. So the output in above case should be- [10,40,60,70]

I did this by looping through each row of dataframe, but it's quite slow. Is there is a faster way to accomplish this?

Dummy test code -

for i in np.arange(0, df.shape[0]):
    print(df.iloc[i, df['INDX'][i]])

1 Answer 1

4

Try lookup:

cols = df.columns[:2]

df.lookup(df.index, cols[df.INDX])

Output:

array([10, 40, 60, 70])

Update As commented by Scott, lookup is deprecated. We can resolve to numpy indexing:

df[cols].to_numpy()[np.arange(len(df)), df['INDX']]
Sign up to request clarification or add additional context in comments.

3 Comments

Read through these issues on pandasdev github. github.com/pandas-dev/pandas/issues/39171
Can't help putting a comment in the issue on Github.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.