Get substring in Python

Question

I have variables which represent email subject with these strings:

Snap: Processor
 'ir.basisswap-1702|sydney-ir.basisswap-ricsxml-location_mapping' for
 '20181231' failed [Production2]

and Snap: 'ir.broker.caplet.vol' RBS data valucheck failed [production]

Desired output:

I want to get values between Snap: and failed

Processor 'ir.basisswap-1702|sydney-ir.basisswap-ricsxml-location_mapping' for '20181231' and 'ir.broker.caplet.vol' RBS data valucheck

regex1 = r'Snap:\s*(\S+)'
          a=re.findall(regex1 ,mail["Subject"])

Actual output:

Processor for first and ir.broker.caplet.vol for second

You're capturing only things up to (not including) the next space. I don't have a proper keyboard to answer properly but perhaps that will get you going. — Mark Smith
– Mark Smith, Commented Dec 31, 2018 at 20:39

Barmar · Accepted Answer · 2018-12-31 20:38:10Z

2

\S+ only matches a sequence of non-whitespace characters, so the match ends at the next space.

You want to match until the word failed, so use:

regex1 = r'Snap:\s*(.+?)\s+failed'

You need to use a non-greedy +? quantifier so that it only matches up to the first failed.

If the subjects contain newline characters, you should also use the re.DOTALL flag so that . will match newline.

answered Dec 31, 2018 at 20:38

Barmar

789k57 gold badges554 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1