5

I'm struggling with regular expression whole day and couldn't find a solution. I'm trying to find some specific number in strings that contains numbers, semicolons, colons and whitespaces.

For our purpose let's say I'm looking for number 1234

Here are few examples which should match (Every line is a different string):

1234
;1234;
1234 : 5678
;1234,3321

And example that shouldn't match (because it's different number):

;12345;
0123456

My current attempt:

[^(0-9*)]1234[^(0-9*)]

Here is a permalink to Regex Tester with my problem: Regex Tester fiddle

3
  • 1
    In which language or tool are you going to use the pattern eventually? Also, this is not how character classes word, you are looking for negative lookarounds Commented Aug 27, 2013 at 14:18
  • [^(0-9*)] means not a digit (0-9), parentheses (( or )) or a star *. You may want to use simply [^0-9] (not a digit). Commented Aug 27, 2013 at 14:26
  • I'm going to use it in MySQL using REGEX in WHERE Commented Aug 27, 2013 at 14:28

3 Answers 3

4

Maybe try this: ([^0-9]|^)1234([^0-9]|$) In this case you don't need the lookaround features.

You can use this to understand regexp better. It has a nice gui to visualize the pattern. Debuggex

Sign up to request clarification or add additional context in comments.

7 Comments

Note that this will fail to match the number at the beginning or end of the string.
The '1234' will be taken as repeat count for the set
In some implementations it will. not all use {n} for repeat count
@Jay: You have an example of one that doesn't? Every regex implementation i've ever seen that allows repeat counts, uses either {} or \{\} to delimit them.
@Jay the one used by the OP's regex tester (ECMAScript) doesn't. Neither do most popular (Perl-based or POSIX-based ones). I'd be very interested in a counterexample though.
|
3

If your flavor supports lookahead and lookbehind, go with this:

(?<!\d)1234(?!\d)

Lookaround tests for occurences of characters without matching them. Negative lookaround only accepts when there is no occurence.

If it supports word boundaries:

\b1234\b

Word boundaries include eg. whitespace and punctuation.

Otherwise check for non-digit characters and add string start and end:

(^|\D)1234($|\D)

If your engine does not even support \d and \D, replace them by [0-9] respective [^0-9].

3 Comments

^ and $ in a character class are just literal characters. You need to use the anchors outside of it with an alternation like (?:^|\D)...(?:\D|$)
Aye, was too fast -- edited my post, but omitted the non-matching groups as this was no requirement, but makes the query more complicated.
Sure, that's fine. I just think, even though it clutters up the pattern, it's one of the most important regex habits to get into. Because someone who does know that normal parentheses do capture might otherwise be confused what we are capturing for - so I always try to be explicit and avoid any unnecessary overhead at the same time.
0

This might work:

.*[^0-9]*[1][2][3][4][^0-9]*.*

How it works:

.*             anything
[^0-9]*        an optional character that is not a number
[1][2][3][4]   "1234" done this way because it will be taken as a repeat count unless escaped
[^0-9]*        an optional character that is not a number
.*             anything

There might be an issue with strings that start or end with "1234" and have no other characters. The match for anything on the front and back may not be needed depending on the implementation of regex.

4 Comments

Since the [^0-9] are optional, the .* can match everything except the 1234 (including adjacent numbers). Also there's no need to put the numbers in classes.
They will be taken as a repeat count unless escaped
No they won't. that's {1234}
No, they won't. Repeat counts are in {}.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.