I am building a regex to extract the header values from a forwarded email in Python. I am only interested in the first appearance of these kinds of headers in an email and I only want to capture the text parts appearing after the colons.
From: ...
Sent: ...
To: ...
Subject: ...
The following regex works fine using re.search for most variations of the above format:
(?:From\s*:\s*)(.*)(?:\n*)(?:Sent\s*:\s*)(.*)(?:\n*)(?:To\s*:\s*)(.*)(?:\n*)(?:Subject\s*:\s*)
but sometimes, the different header parts are ordered differently and have missing elements, such as below:
Sent: ...
From: ...
Subject: ...
I thought I could use a positive lookahead to match the header format in any order but I could not get this to work. Does anyone have any idea how this can be done efficiently? Any help is greatly appreciated.