0

Hello I have a df such as

COL1   COL2
G1     QANH010008.1:18255-18820(-):Hab_ob
G1     QANH010002:7-10(-):Hab_ob

and I would like to create 2 new COL3 and COL4 where i put the number before the first - and after the first -

Here the ouptut should be

COL1   COL2                                COL3   COL4
G1     QANH010008.1:18255-18820(+):Hab_ob  18255  18820
G1     QANH010002:7-10(-):Hab_ob           7      10 

1 Answer 1

2

You can used named capturing groups for this then join to the original DataFrame. This answer incorporates a couple of suggestions from @MarkWang.

df.join(df['COL2'].str.extract(r'(?P<COL3>\d+)\-(?P<COL4>\d+)')) 

Output:

Out[206]: 
  COL1                                COL2   COL3   COL4
0   G1  QANH010008.1:18255-18820(-):Hab_ob  18255  18820
1   G1           QANH010002:7-10(-):Hab_ob      7     10
Sign up to request clarification or add additional context in comments.

2 Comments

Good approach, in particular the ?P part. Two comments,(1) expand=True should be default I guess, (2) would use 'df.join(df['COL2'].str.extract)' rather than pd.concat to save some space
@MarkWang: updated answer taking into account your suggestions. Much shorter now! Thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.