I am trying to extract a number from a pandas series of strings. For example consider this series:
s = pd.Series(['a-b-1', 'a-b-2', 'c1-d-5', 'c1-d-9', 'e-10-f-1-3.xl', 'e-10-f-2-7.s'])
0 a-b-1
1 a-b-2
2 c1-d-5
3 c1-d-9
4 e-10-f-1-3.xl
5 e-10-f-2-7.s
dtype: object
There are 6 rows, and three string formats/templates (known). The goal is to extract a number for each of the rows depending on the string. Here is what I came up with:
s.str.extract('a-b-([0-9])|c1-d-([0-9])|e-10-f-[0-9]-([0-9])')
and this correctly extracts the numbers that I want from each row:
0 1 2
0 1 NaN NaN
1 2 NaN NaN
2 NaN 5 NaN
3 NaN 9 NaN
4 NaN NaN 3
5 NaN NaN 7
However, since I have three groups in the regex, I have 3 columns, and here comes the question:
Can I write a regex that has one group or that can generate a single column, or do I need to coalesce the columns into one, and how can I do that without a loop if necessary?
Desired outcome would be a series like:
0 1
1 2
2 5
3 9
4 3
5 7
(?:a-b|c1-d|e-10-f-[0-9])-([0-9])regex101.com/r/GPFI94/1