1

heres the problem... Imagine the following dataframe as an example:

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [3, 4, 5, 6, 7],'col3': [3, 4, 5, 6, 7],'col4': [1, 2, 3, 3, 2]})

Now, I would like to add another column "col 5" which is calculated as follows:

if the value of "col4" is 1, then give me the corresponding value in the column with index 1 (i.e. "col2" in this case), if "col4" is 2 give me the corresponding value in the column with index 2 (i.e. "col3" in this case), etc.

I have tried the below and variations of it, but I can't seem to get the right result

df["col5"] = df.apply(lambda x: df.iloc[x,df[df.columns[df["col4"]]]])

Any help is much appreciated!

2 Answers 2

2

If your 'col4' is the indicator of column index, this will work:

df['col5'] = df.apply(lambda x: x[df.columns[x['col4']]], axis=1)

df

#   col1  col2  col3  col4  col5
#0     1     3     3     1     3
#1     2     4     4     2     4
#2     3     5     5     3     3
#3     4     6     6     3     3
#4     5     7     7     2     7
Sign up to request clarification or add additional context in comments.

2 Comments

Indeed, thanks. I actually tried something similar, but without the axis argument...
Note this solution involves a Python-level loop. You may find a list comprehension better. See this related answer.
1

You can use fancy indexing with NumPy and avoid a Python-level loop altogether:

df['col5'] = df.iloc[:, :4].values[np.arange(df.shape[0]), df['col4']]

print(df)

   col1  col2  col3  col4  col5
0     1     3     3     1     3
1     2     4     4     2     4
2     3     5     5     3     3
3     4     6     6     3     3
4     5     7     7     2     7

You should see significant performance benefits for larger dataframes:

df = pd.concat([df]*10**4, ignore_index=True)

%timeit df.apply(lambda x: x[df.columns[x['col4']]], axis=1)       # 2.36 s per loop
%timeit df.iloc[:, :4].values[np.arange(df.shape[0]), df['col4']]  # 1.01 ms per loop

3 Comments

Amazing, works. Thank you! One more thing: How would I add a condition to this, i.e. only do this if value in col4 is > 1 e.g. and otherwise take 0?
@SanMu, No problem, have updated in fact to match what I think you need, i.e. 1 maps to col2. Feel free to accept a solution that helped.
@SanMu, you can use np.where, e.g. np.where(df['col4'] > 1, ..., 0).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.