Problem
How to replace X with _, given the following dataframe:
data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'],
'city':['Ashland', 'Springfield', 'Ashland']}
df = pd.DataFrame(data)
The streets need to be edited, replacing each X with an underscore _.
Notice that the number of Integers changes, as does the number of Xs. Also, street names such as Xerxes should not be edited to _er_es, but rather left unedited. Only the street number section should change.
Desired Output
data = {'street':['13__ First St', '2___ First St', '47_ Second Ave'],
'city':['Ashland', 'Springfield', 'Ashland']}
df = pd.DataFrame(data)
Progress
Some potential regex building blocks include:
1. [0-9]+ to capture numbers
2. X+ to capture Xs
3. ([0-9]+)(X+) to capture groups
df['street']replace("[0-9]+)(X+)", value=r"\2", regex=True, inplace=False)
I'm pretty weak with regex, so my approach may not be the best. Preemptive thank you for any guidance or solutions!
_with the number of times X appears? is if it was13XXXthen you want13___(three underscores) ?