2

I am trying to extract substrings from a long string in python3

def get_data(text):
    initials = text.split()[1]
    names = re.search(initials+'(.*)EMP',text).group(1).lstrip().title()

    return initials, names

I need the following outputs

x,y = get_data('J JS JOHN SMITH EMP 223456')
JS
John Smith 

x,y = get_data('J JB JOE BLOGGS CONT 223456')
JB
Joe Bloggs

x,y = get_data('J JS JOHN SMITH 223456')
JS
John Smith

I can do it with either EMP or CONT but am struggling to do it with EMP OR CONT OR 'None' I'm new to regex hence help appreciated

1 Answer 1

2

No need to do a split and then search.

You can use a single regex in re.findall or re.search or re.match:

^\S+\s+(\S+)\s+(.+?)(?:\s+(?:EMP|CONT))?\s+\d+

RegEx Demo

RegEx Details:

  • ^: Start
  • \S+: Match 1+ non-whitespaces
  • \s+: Match 1+ whitespaces
  • (\S+): Match 1+ non-whitespaces and capture in group #1
  • \s+: Must be followed by 1+ whitespaces
  • (.+?): Match 1+ of any character and capture in group #2
  • (?:\s+(?:EMP|CONT))?: optionally match EMP or CONT after 1+ whitespaces
  • \s+\d+: Followed by 1+ whitespaces and 1+ digits
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.