Python Dataframe Get Substring

Question

I have some dataframes where its id colum like

A12-B-56
E1234B115

It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before '-B-' and 'B'. One way that I came up with is using a for loop and re.split('(\d+)', some_text). Is there a faster way to do this?

@wwnde It does not solve my problem. I only need the first part, A12, of the data. — user398843
– user398843, Commented Dec 14, 2022 at 3:53
can you more clearly describe what the exact criteria for the match is? maybe provide some additional examples? is it always the first three characters? or always a letter and then variable number of numbers, and you want to split before the next non-numeric number? etc — Michael Delgado
– Michael Delgado, Commented Dec 14, 2022 at 3:54

wwnde · Accepted Answer · 2022-12-14 05:59:28Z

2

Use a lookahead assertion to get all the alphanumerics from start that are followed by B. Would be wise to do this before you replace -. code below:

df=pd.DataFrame({'column':['A12-B-56','A123B567']})

df= df.assign(column=(df['column'].str.replace('\-','', regex=True).str.extract('(^\w+(?=B))')))

As proposed by @mozway make it a one liner short and concise

df['column'].str.extract('(^\w+)-?B')

edited Dec 14, 2022 at 5:59

answered Dec 14, 2022 at 4:09

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

In one step df['column'].str.extract('(^\w+)-?B') ;)

Collectives™ on Stack Overflow

Python Dataframe Get Substring

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related