1

I want to extract value of requirement number from a column named "Linked Projects" in a dataframe. This column "Linked Projects" contains a string in the below format:

Linked Issues  
Requirement-12345, NewPr-8795, OldPr-78941  
MSR-85749, Requirement-74852, NewPr-95418

Requirement-894895  

OldPr-85974, NewPr-968572, Requirement-985785  

Expected Result:
What I want is to store the the requirement number in a new column like below:

Requirement Number  
Requirement-12345  
Requirement-74852  

Requirement-894895

Requirement-985785
1
  • If in all columns Requirement is written properly you can use regex and extract from each row using df.apply() Commented Apr 22, 2019 at 6:01

1 Answer 1

1

Use Series.str.extract for get values with regex - r'(Requirement-\d+)' string with integer for get first matched value per row:

df['new'] = df['Linked Issues'].str.extract(r'(Requirement-\d+)')
print (df)
                                    Linked Issues                 new
0      Requirement-12345, NewPr-8795, OldPr-78941   Requirement-12345
1       MSR-85749, Requirement-74852, NewPr-95418   Requirement-74852
2                              Requirement-894895  Requirement-894895
3  OldPr-85974, NewPr-968572, Requirement-985785   Requirement-985785

If possible multiple values per row use Series.str.findall with Series.str.join:

df['new'] = df['Linked Issues'].str.findall(r'(Requirement-\d+)').str.join(', ')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the above solution, you are a great help. It worked perfectly fine. I checked some document and need to confirm that here we used r in r'(Requirement-\d+) to let the python know that we are dealing with a regular expression.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.