2

I have some dataframes where its id colum like

A12-B-56
E1234B115

It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before '-B-' and 'B'. One way that I came up with is using a for loop and re.split('(\d+)', some_text). Is there a faster way to do this?

4
  • df['column].str.replace('\-','',regex=True) Commented Dec 14, 2022 at 3:49
  • @wwnde It does not solve my problem. I only need the first part, A12, of the data. Commented Dec 14, 2022 at 3:53
  • can you more clearly describe what the exact criteria for the match is? maybe provide some additional examples? is it always the first three characters? or always a letter and then variable number of numbers, and you want to split before the next non-numeric number? etc Commented Dec 14, 2022 at 3:54
  • @MichaelDelgado I edited my questions to make it clearer. Commented Dec 14, 2022 at 4:06

1 Answer 1

2

Use a lookahead assertion to get all the alphanumerics from start that are followed by B. Would be wise to do this before you replace -. code below:

df=pd.DataFrame({'column':['A12-B-56','A123B567']})

df= df.assign(column=(df['column'].str.replace('\-','', regex=True).str.extract('(^\w+(?=B))')))

As proposed by @mozway make it a one liner short and concise

df['column'].str.extract('(^\w+)-?B')
Sign up to request clarification or add additional context in comments.

1 Comment

In one step df['column'].str.extract('(^\w+)-?B') ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.