2

I have a pandas dataframe column with strings that looks like this:

Column A

text moretext 251 St. Louis Apt.54
123 Orange Drive
sometext somemoretext 171 Poplar street
textnew 11th street 
77 yorkshire avenue

I want to remove the text before the numeric values i.e I want the output to be something like this:

Column A

251 St. Louis Apt.54
123 Orange Drive
171 Poplar street
11th street 
77 yorkshire avenue

2 Answers 2

5

Let's use regex and extract:

df['Column A'] = df['Column A'].str.extract(r'(\d+.+$)')

Output:

0    251 St. Louis Apt.54
1        123 Orange Drive
2       171 Poplar street
3             11th street
4     77 yorkshire avenue
Name: Column A, dtype: object

The regex states get a group of characters start with a number of any length and continue until the end of the line.

Sign up to request clarification or add additional context in comments.

Comments

2

This function is finding the index of the first numerical character in the string and selecting the remaining part of the string. This function is then applied to each value of the column using apply function

def change(string):
    for i, c in enumerate(string):
         if c.isdigit():
            idx = i
            break
    return string[idx:]

data[A] = data[A].apply(change, axis = 0)

1 Comment

I'd suggest adding some explanation to your answer if you want to make it useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.