0

I need some help with creating a regex string. I have this long list of numbers:

7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177

Basically, I need to find the numbers that contain numbers 8 and 9 so I can remove them from the list.

I tried this regex: ([0-7][0-7][8-9]{2}) but that will only match numbers that strictly have both numbers 8 & 9.

7 Answers 7

4

How about you just write some simple code rather than trying to cram everything into a regex?

#!/usr/bin/perl -i -p      # Process the file in place

@n = split / /;            # Split on whitespace into array @n
@n = grep { !/[89]/ } @n;  # @n now contains only those numbers NOT containing 8 or 9
$_ = join( ' ', @n );      # Rebuild the line
Sign up to request clarification or add additional context in comments.

1 Comment

In JavaScript that would be str.split(" ").filter(function(el){ return !/[89]/g.test(el);}).join(" "); or in ES6 - str.split(" ").split(n => !/[89]/g.test(n);).join(" ");
1

Dalorzo answer would work, but I suggest a different approach:

/\b(?=\d{4}\b)(\d*[89]\d*)\b/g

Assuming you are only looking for 4 digit numbers, then it is using a positive lookahead to ensure you have those (so it won't match, say, 3 or 5 digit numbers) and then checks if at least one of the digits is 8 or 9.

http://regex101.com/r/hW4vQ3

If you need to catch all numbers, not just four digit ones, then

/\b(?=\d+\b)(\d*[89]\d*)\b/g

See it in action:

http://regex101.com/r/bW2gH3

And as a bonus, the regex is also capturing the numbers so you can do a replace afterwards, if you wish

Comments

0

This is a bit long-winded, but easier to decipher:

/\b([89]\d{3}|\d[89]\d{2}|\d{2}[89]\d|\d{3}[89])\b/g

It also restricts the search to 4-digit groups.

Comments

0

How about:

/\b((?:[\d]+)?[89](?:[\d]+)?)\b/g

Online Demo

  • \b will match the end and the begging of each number.
  • (?:[\d]+)? a non matching group of numbers, we need optional at the begging [89] and ending [89] and containing [89].
  • ?: The non-matching group may be optional in this expression but there was not need to match the sub-groups.

3 Comments

I could be wrong, but due to the OP's wording, I don't believe this to completely answer the question, due to the fact that several numbers containing 8/9 are left out.
Unfortunately that is still missing some. For example, 7148 is not being matched.
just noticed I missed ? next to the second \d+ sorry about that copied earlier version by mistake :D
0

You can use this pattern:

[0-7]*(?:8[0-8]*9|9[0-9]*8)[0-9]*

or with a backreference:

(?:[0-9]*(?!\1)([89])){2}[0-9]*

Comments

0
re.findall(r"(\d\d[0-7][89])|(\d\d[89][0-7])|(\d\d[89][89])",x)

Works for the input given.

Comments

0

Slightly simpler regex with lookahead:

(?=\d*[89])\d+

Regular expression visualization

Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.