Python Regex To Ignore Date Pattern

Question

Sample Data:

Weight Measured: 80.7 kg (11/27/1900 24:59:00)
Pulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)
Weight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)
Resp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|
11.2 oz)
And one extra weight example 100lbs

Partially working Regex:

\b(?i)(?:weight|wt)\b(?:.){1,25}?\b(\d+\.?(?:\d+)).*?(\w+)\b

Current output:

('80.7', 'kg'), ('101.2', 'kg'), ('11', '11'), ('84.7', 'kg'), ('100', 'lbs')

Expected ouput:

('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')

How do I make my current regex ignore dates and capture the value that follows? Also, how do I make this regex to stop matching at the end of line?

Remove the dates before running your regex. It's a simple, failsafe pattern. — Jongware
– Jongware, Commented Jan 21, 2020 at 22:11
Ok, so nothing or whitespace. Use r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)' — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 21, 2020 at 22:24
Well, \s can match newlines, replace with [^\S\r\n] to only match horizontal whitespace — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 21, 2020 at 22:33

Wiktor Stribiżew · Accepted Answer · 2020-01-21 22:25:45Z

You may use

re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text)

See the regex demo

Details

(?i) - same as re.I - case insensitive mode on
\b - a word boundary
w(?:eigh)?t - wt or weight
\b - a word boundary
.{1,25}? - any 1 to 25 chars other than line break chars, as few as possible
\b - a word boundary
(?<!\d/) - a negative lookbehind that fails the match if immediately to the left of the current location there is a digit and /
(\d+(?:\.\d+)?) - Group 1: one or more digits followed with an optional sequence of a dot and one or more digits
(?!/?\d) - a negative lookahead that fails the match if immediately to the right of the current location there is an optional / and a digit
\s* - 0+ whitespaces
(\w+) - Group 2: one or more letters, digits or underscores.

See Python demo:

import re
text = """Weight Measured: 80.7 kg (11/27/1900 24:59:00)\nPulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)\nWeight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)\nResp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|\n11.2 oz)\nAnd one extra weight example 100lbs"""
print(re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text))
# => [('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')]

Collectives™ on Stack Overflow

Python Regex To Ignore Date Pattern

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related