1

Sample Data:

Weight Measured: 80.7 kg (11/27/1900 24:59:00)
Pulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)
Weight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)
Resp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|
11.2 oz)
And one extra weight example 100lbs

Partially working Regex:

\b(?i)(?:weight|wt)\b(?:.){1,25}?\b(\d+\.?(?:\d+)).*?(\w+)\b

Current output:

('80.7', 'kg'), ('101.2', 'kg'), ('11', '11'), ('84.7', 'kg'), ('100', 'lbs')

Expected ouput:

('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')

How do I make my current regex ignore dates and capture the value that follows? Also, how do I make this regex to stop matching at the end of line?

9
  • 1
    you should add the expected output as well. Commented Jan 21, 2020 at 22:08
  • Remove the dates before running your regex. It's a simple, failsafe pattern. Commented Jan 21, 2020 at 22:11
  • Don't intend to modify the data in any form or factor! Commented Jan 21, 2020 at 22:17
  • 1
    Ok, so nothing or whitespace. Use r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)' Commented Jan 21, 2020 at 22:24
  • 1
    Well, \s can match newlines, replace with [^\S\r\n] to only match horizontal whitespace Commented Jan 21, 2020 at 22:33

1 Answer 1

1

You may use

re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text)

See the regex demo

Details

  • (?i) - same as re.I - case insensitive mode on
  • \b - a word boundary
  • w(?:eigh)?t - wt or weight
  • \b - a word boundary
  • .{1,25}? - any 1 to 25 chars other than line break chars, as few as possible
  • \b - a word boundary
  • (?<!\d/) - a negative lookbehind that fails the match if immediately to the left of the current location there is a digit and /
  • (\d+(?:\.\d+)?) - Group 1: one or more digits followed with an optional sequence of a dot and one or more digits
  • (?!/?\d) - a negative lookahead that fails the match if immediately to the right of the current location there is an optional / and a digit
  • \s* - 0+ whitespaces
  • (\w+) - Group 2: one or more letters, digits or underscores.

See Python demo:

import re
text = """Weight Measured: 80.7 kg (11/27/1900 24:59:00)\nPulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)\nWeight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)\nResp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|\n11.2 oz)\nAnd one extra weight example 100lbs"""
print(re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text))
# => [('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.