Pandas select columns with regex and divide by value

Question

I want to divide all values in certain columns matching a regex expression by some value and still have the complete dataframe.

As can be found here: How to select columns from dataframe by regex , e.g. all columns starting with d can be selected with:

df.filter(regex=("d.*"))

Now I have the columns selected I need, I want e.g. divide the values by 2. Which is possible with the following code:

df.filter(regex=("d.*")).divide(2)

However if I try to update my dataframe like this, it gives a can't assign to function call:

df.filter(regex=("d.*")) = df.filter(regex=("d.*")).divide(2)

How to properly update my existing df?

There is an actual method named update that was designed for this exact purpose. stackoverflow.com/a/48259109/2336654. — piRSquared
– piRSquared, Commented Jan 15, 2018 at 16:06

piRSquared · Accepted Answer · 2018-01-15 08:37:54Z

15

The following technique is not limited to use with filter and can be applied far more generally.

Setup
I'll use @cᴏʟᴅsᴘᴇᴇᴅ setup
Let df be:

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

Inplace update
Use pd.DataFrame.update
update will take the argument dataframe and alter the calling dataframe where index and column values match the argument.

df.update(df.filter(regex='d.*') / 3)
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

Inline copy
Use pd.DataFrame.assign
I use the double splat ** to unpack the argument dataframe into a dictionary where column names are keys and the series that are the columns are the values. This matches the required signature for assign and overwrites those columns in the copy that is produced. In short, this is a copy of the calling dataframe with the columns overwritten appropriately.

df.assign(**df.filter(regex='d.*').div(3))

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

edited Jan 15, 2018 at 8:37

answered Jan 15, 2018 at 8:16

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs95 Over a year ago

Nice to an answer from you every now and then. :)

piRSquared Over a year ago

Thanks @cᴏʟᴅsᴘᴇᴇᴅ!

NumesSanguis Over a year ago

They were all good answers, but since this one is the most general, and short in code, I'll accept this one.

jezrael · Accepted Answer · 2018-01-15 07:25:49Z

10

I think you need extract columns names and assign:

df[df.filter(regex=("d.*")).columns] = df.filter(regex=("d.*")).divide(2)

Or:

cols = df.columns[df.columns.str.contains('^d.*')]
df[cols] /=2

edited Jan 15, 2018 at 7:25

answered Jan 15, 2018 at 7:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

cs95 · Accepted Answer · 2018-01-15 07:25:57Z

9

Use df.columns.str.startswith.

c = df.columns.str.startswith('d')    
df.loc[:, c] /= 2

As an example, consider -

df

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

c = df.columns.str.startswith('d')  
c
array([ True,  True, False], dtype=bool)

df.loc[:, c] /= 3    # 3 instead of 2, just for example
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

If you need to pass a regex, use str.contains -

c = df.columns.str.contains(p) # p => your pattern

And the rest of your code follows.

edited Jan 15, 2018 at 7:25

answered Jan 15, 2018 at 7:20

cs95

406k106 gold badges744 silver badges797 bronze badges

4 Comments

NumesSanguis Over a year ago

Thanks as well. Your answer solves the example question, but I used that, because it is simplified. My own code needs a more complex regex, so startswith doesn't cut the case for that.

cs95 Over a year ago

@NumesSanguis Then use df.columns.str.contains, and pass a regex. Still simpler.

redacted Over a year ago

df.loc[:, c] /= 2 is lovely, never thought of that one!

cs95 Over a year ago

@RobinNemeth Yup, you would've seen it in my answer first ;)

Collectives™ on Stack Overflow

Pandas select columns with regex and divide by value

3 Answers 3

3 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related