Extract part of a column using regex or split in python

Question

Hello I have a df such as

COL1   COL2
G1     QANH010008.1:18255-18820(-):Hab_ob
G1     QANH010002:7-10(-):Hab_ob

and I would like to create 2 new COL3 and COL4 where i put the number before the first - and after the first -

Here the ouptut should be

COL1   COL2                                COL3   COL4
G1     QANH010008.1:18255-18820(+):Hab_ob  18255  18820
G1     QANH010002:7-10(-):Hab_ob           7      10

mechanical_meat · Accepted Answer · 2020-05-10 14:41:53Z

2

You can used named capturing groups for this then join to the original DataFrame. This answer incorporates a couple of suggestions from @MarkWang.

df.join(df['COL2'].str.extract(r'(?P<COL3>\d+)\-(?P<COL4>\d+)'))

Output:

Out[206]: 
  COL1                                COL2   COL3   COL4
0   G1  QANH010008.1:18255-18820(-):Hab_ob  18255  18820
1   G1           QANH010002:7-10(-):Hab_ob      7     10

edited May 10, 2020 at 14:41

answered May 10, 2020 at 14:37

mechanical_meat

170k25 gold badges237 silver badges231 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mark Wang Over a year ago

Good approach, in particular the ?P part. Two comments,(1) expand=True should be default I guess, (2) would use 'df.join(df['COL2'].str.extract)' rather than pd.concat to save some space

mechanical_meat Over a year ago

@MarkWang: updated answer taking into account your suggestions. Much shorter now! Thank you :)

Collectives™ on Stack Overflow

Extract part of a column using regex or split in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related