1

I'm using pandas to analyze data from 3 different sources, which are imported into dataframes and require modification to account for human error, as this data was all entered by humans and contains errors.

Specifically, I'm working with street names. Until now, I have been using .str.replace() to remove street types (st., street, blvd., ave., etc.), as shown below. This isn't working well enough, and I decided I would like to use regex to match a pattern, and transform that entire column from the original street name, to the pattern matched by regex.

df['street'] = df['street'].str.replace(r' avenue+', '', regex=True)

I've decided I would like to use regex to identify (and remove all other characters from the address column's fields): any number of integers, followed by a space, and then the first 3 number of alphabetic characters. For example, "3762 pearl street" might become "3762 pea" if x is 3 with the following regex:

(\d+ )+\w{0,3}

How can I use panda's .str.replace to do this? I don't want to specify WHAT I want to replace with the second argument. I want to replace the original string with the pattern matched from regex. Something that, in my mind, might work like this:

df['street'] = df['street'].str.replace(ORIGINAL STRING, r' (\d+ )+\w{0,3}, regex=True)

which might make 43 milford st. into "43 mil".

Thank you, please let me know if I'm being unclear.

0

1 Answer 1

4

you could use the extract method to overwrite the column with its own content

pat = r'(\d+\s[a-zA-Z]{3})'
df['street'] = df['street'].str.extract(pat) 

Just an observation: The regex you shared (\d+ )+\w{0,3} matches the following patterns and returns some funky stuff as well

  • 1131 1313 street
  • 121 avenue
  • 1 1 1 1 1 1 avenue
  • 42

I've changed it up a bit based on what you described, but i'm not sure if that works for all your datapoints.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks so much- I'm still very new to writing my own regular expressions, so what I had there was just my best attempt at it. Yours seems to work well and matches every example pattern I've tested... thank you I will give extract () a shot.
oh that's awesome! i find regex very useful but still have a lot to learn too! this game helped me a ton: regexcrossword.com

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.