Replace values in multiple columns of dataframe with numpy

Question

I'm trying to replace values in multiple columns of a dataframe with numpy.where in Python by doing the following:

df['X, Y, Z'] = np.where(df['X, Y, Z'] < 1, 0, df['X, Y, Z'])

However, it gives me the following error: KeyError: 'X, Y, Z'

I have already tried doing the strings separately, like 'X', 'Y', 'Z', but it doesn't work either.

How do I resolve this?

yacola · Accepted Answer · 2020-10-21 12:12:37Z

1

What about passing a proper list of keys ['X', 'Y', 'Z'] to your dataframe instead of a long string 'X, Y, Z':

import numpy as np
import pandas as pd

data = {'X': np.linspace(0,2,8), 'Y': np.linspace(0,2,8)*2, 'Z': np.linspace(0,2,8)*4}

df = pd.DataFrame.from_dict(data)

which gives:

>>> df
>>> 0  0.000000  0.000000  0.000000
>>> 1  0.285714  0.571429  1.142857
>>> 2  0.571429  1.142857  2.285714
>>> 3  0.857143  1.714286  3.428571
>>> 4  1.142857  2.285714  4.571429
>>> 5  1.428571  2.857143  5.714286
>>> 6  1.714286  3.428571  6.857143
>>> 7  2.000000  4.000000  8.0000000

df[['X', 'Y', 'Z']] = np.where(df[['X', 'Y', 'Z']] < 1, 0, df[['X', 'Y', 'Z']])

and now with no longer KeyError:

>>> df
>>> 0  0.000000  0.000000  0.000000
>>> 1  0.000000  0.000000  1.142857
>>> 2  0.000000  1.142857  2.285714
>>> 3  0.000000  1.714286  3.428571
>>> 4  1.142857  2.285714  4.571429
>>> 5  1.428571  2.857143  5.714286
>>> 6  1.714286  3.428571  6.857143
>>> 7  2.000000  4.000000  8.000000

edited Oct 21, 2020 at 12:12

answered Oct 21, 2020 at 12:01

yacola

3,0332 gold badges13 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Kaashoed Over a year ago

I'm sorry but this solution seems a bit confusing to me. I find it weird because my method works if I only put 'X' as the column instead of all three, so I don't know why that's happening.

yacola Over a year ago

df['X', Y', 'Z'] is not the same as df[['X', 'Y', 'Z']] that is why. In one dimensional case df['X'] is indeed the same as df[['X']]. Python doesn't split 'X, Y, Z' and interpret it as 3 distincts keys nor it does understand 3 distincts keys, but it needs to iterate through list of keys ['X', 'Y', 'Z']one at a time, hope it's clear now.

Kaashoed Over a year ago

I understand why my method doesn't work, but the method you use still confuses me. The weird thing is that I remember my code working last week where I used this same method with multiple columns on the same dataframe, so I don't know what I did different now.

Kaashoed Over a year ago

Will do. Can you maybe just explain why you choose the values 0,2,8 and the multiplying with 2 and 4 part?

yacola Over a year ago

In order to generate small dummy columns of 8 samples that span a range between 0 and 2 where your test condition is meaningful (since you failed to provide a Minimal Reproducible Example). I multiplied by 2 and 4 to get different columns where the condition applied differently. This is arbitrary but shows you that it works for various data :)

Collectives™ on Stack Overflow

Replace values in multiple columns of dataframe with numpy

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related