2

I know how to delete extra-word numbers in Python, with:

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

I'm wondering whether it would be possible to perform the same action while keeping dates:

s = "I want to delete numbers like 84 but not dates like 2015"

In English a quick and dirty rule could be: if the number starts with 18, 19, or 20 and has length 4, don't delete.

1 Answer 1

3

To match any digit sequences other than 4-digit sequences starting with 18/19/20, you can use

r'\b(?!(?:18|19|20)\d{2}\b)\d+\b'

See regex demo

The regex matches:

  • \b - leading word boundary
  • (?!(?:18|19|20)\d{2}\b) - a negative lookahead that restricts the subsequent pattern \d+ to only match when the no 18, 19 or 20 are in the beginning and then followed by exactly two digits \d{2} (note you can shorten the lookahead to (?!(?:1[89]|20)\d{2}\b) but a lot of people usually frown upon that as readability suffers)
  • \d+ - 1 or more digits
  • \b - trailing word boundary

Python code:

p = re.compile(r'\b(?!(?:18|19|20)\d{2}\b)\d+\b')
test_str = "Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, 4 5 6 created in 2008"
print p.sub("", test_str)
Sign up to request clarification or add additional context in comments.

4 Comments

Just thought about float values, and came up with \b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?\d+\b.
thanks for the quick reply but with Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, 4 5 6 created in 2008 it only remove the first number (4), not 5 and 6... see: regex101.com/r/tB0qR8/1
with respect to your first comment, thanks a lot for this improvement.
It works if you add /g flag. In Python, re.sub replaces all matches (no need to specify anything) by default.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.