Pandas regex, replace group with char

Question

Problem

How to replace X with _, given the following dataframe:

data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data)

The streets need to be edited, replacing each X with an underscore _.

Notice that the number of Integers changes, as does the number of Xs. Also, street names such as Xerxes should not be edited to _er_es, but rather left unedited. Only the street number section should change.

Desired Output

data = {'street':['13__ First St', '2___ First St', '47_ Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data)

Progress

Some potential regex building blocks include:
1. [0-9]+ to capture numbers
2. X+ to capture Xs
3. ([0-9]+)(X+) to capture groups

df['street']replace("[0-9]+)(X+)", value=r"\2", regex=True, inplace=False)

I'm pretty weak with regex, so my approach may not be the best. Preemptive thank you for any guidance or solutions!

you want to _ with the number of times X appears? is if it was 13XXX then you want 13___ (three underscores) ? — Umar.H
– Umar.H, Commented Jan 9, 2020 at 16:48
@Datanovice exactly so, 2 X should be replaced by 2 _. X -> _, XX -> __, XXX -> ___. — MinneapolisCoder9
– MinneapolisCoder9, Commented Jan 13, 2020 at 19:47

Quang Hoang · Accepted Answer · 2020-01-09 17:08:22Z

3

IIUC, this would do:

def repl(m):
    return m.group(1) + '_'*len(m.group(2))

df['street'].str.replace("^([0-9]+)(X*)", repl)

Output:

0     13__ First St
1     2___ First St
2    47_ Second Ave
Name: street, dtype: object

answered Jan 9, 2020 at 17:08

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Umar.H Over a year ago

i couldn't get a function to work in df.replace - do you know why? it replaces the entire string with <function repl at 0x000001C242C68268>

Quang Hoang Over a year ago

You need .str.replace, which accepts a function, not replace.

Umar.H Over a year ago

that's right, but if you wanted to make the change across the entire dataframe you would need to loop through every column to use str.replace right?

Quang Hoang Over a year ago

Yes, or df.apply(lamba x: x.str.replace(...)

SublimizeD Over a year ago

This is correct you need the str.replace to run this. It wont take just df.replace. Good work around

Umar.H · Accepted Answer · 2020-01-09 17:13:24Z

2

IIUC, we can pass a function into the repl argument much like re.sub

def repl(m):
    return '_' * len(m.group())

df['street'].str.replace(r'([X])+',repl)

out:

0     13__ First St
1     2___ First St
2    47_ Second Ave
Name: street, dtype: object

if you need to match only after numbers, we can add a '\d{1}' which will only match after a single instance of X

df['street'].str.replace(r'\d{1}([X]+)+',repl)

edited Jan 9, 2020 at 17:13

answered Jan 9, 2020 at 17:05

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Comments

SublimizeD · Accepted Answer · 2020-01-09 16:55:37Z

0

Assuming 'X' only occurs in the 'street' column

streetresult=re.sub('X','_',str(df['street']))

Your desired output should be the result

Code I tested

import pandas as pd
import re

data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data) 
for  i in data:
    streetresult=re.sub('X','_',str(df['street']))
print(streetresult)

edited Jan 9, 2020 at 16:55

answered Jan 9, 2020 at 16:44

SublimizeD

1341 gold badge1 silver badge10 bronze badges

3 Comments

Quang Hoang Over a year ago

This will replace X in 123 Xmas Street as well.

SublimizeD Over a year ago

This is correct, setting the regex rules of if it following a $\d (numeric value) or an $'X' should account for street names such as that. If I'm not mistaken

MinneapolisCoder9 Over a year ago

@SublimizeD sorry, I hadn't made that clarification in the problem, but Quang's correct in pointing out that requirement. I'll edit the problem. Thank you!

Collectives™ on Stack Overflow

Pandas regex, replace group with char

Problem

Desired Output

Progress

3 Answers 3

5 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem

Desired Output

Progress

3 Answers 3

5 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related