11

I want to divide all values in certain columns matching a regex expression by some value and still have the complete dataframe.

As can be found here: How to select columns from dataframe by regex , e.g. all columns starting with d can be selected with:

df.filter(regex=("d.*"))

Now I have the columns selected I need, I want e.g. divide the values by 2. Which is possible with the following code:

df.filter(regex=("d.*")).divide(2)

However if I try to update my dataframe like this, it gives a can't assign to function call:

df.filter(regex=("d.*")) = df.filter(regex=("d.*")).divide(2)

How to properly update my existing df?

1

3 Answers 3

15

The following technique is not limited to use with filter and can be applied far more generally.

Setup
I'll use @cᴏʟᴅsᴘᴇᴇᴅ setup
Let df be:

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

Inplace update
Use pd.DataFrame.update
update will take the argument dataframe and alter the calling dataframe where index and column values match the argument.

df.update(df.filter(regex='d.*') / 3)
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

Inline copy
Use pd.DataFrame.assign
I use the double splat ** to unpack the argument dataframe into a dictionary where column names are keys and the series that are the columns are the values. This matches the required signature for assign and overwrites those columns in the copy that is produced. In short, this is a copy of the calling dataframe with the columns overwritten appropriately.

df.assign(**df.filter(regex='d.*').div(3))

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9
Sign up to request clarification or add additional context in comments.

3 Comments

Nice to an answer from you every now and then. :)
Thanks @cᴏʟᴅsᴘᴇᴇᴅ!
They were all good answers, but since this one is the most general, and short in code, I'll accept this one.
10

I think you need extract columns names and assign:

df[df.filter(regex=("d.*")).columns] = df.filter(regex=("d.*")).divide(2)

Or:

cols = df.columns[df.columns.str.contains('^d.*')]
df[cols] /=2

Comments

9

Use df.columns.str.startswith.

c = df.columns.str.startswith('d')    
df.loc[:, c] /= 2

As an example, consider -

df

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

c = df.columns.str.startswith('d')  
c
array([ True,  True, False], dtype=bool)

df.loc[:, c] /= 3    # 3 instead of 2, just for example
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

If you need to pass a regex, use str.contains -

c = df.columns.str.contains(p) # p => your pattern

And the rest of your code follows.

4 Comments

Thanks as well. Your answer solves the example question, but I used that, because it is simplified. My own code needs a more complex regex, so startswith doesn't cut the case for that.
@NumesSanguis Then use df.columns.str.contains, and pass a regex. Still simpler.
df.loc[:, c] /= 2 is lovely, never thought of that one!
@RobinNemeth Yup, you would've seen it in my answer first ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.