1

I created a regex expression and tested with a string in Rexpal.

enter image description here

Then, i tested it in Objective-C with the same string and i get no result.

    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"^Page(\\d+| \\d+)(:| :|)$" options:NSRegularExpressionCaseInsensitive error:&regError];

As you can see, i did add another '\' char before the 'd', but i get no result at all. If i change the regex expression from "^Page(\d+| \d+)(:| :|)$" to "Page(\d+| \d+)(:| :|)", i get way too many results. It's like my 'AND' statements were understoud as 'OR' statements. Anybody got an idea of what is happening?

EDIT : For the regex expression "^page(\d+| \d+)(:| :|)$" for the string "page 15 :", will return me with 3 solutions "page 15 :", "15", and ":". I only want the first one. Like i said, it's like my AND is transformed in a OR/AND. I would like the number and the semi-colon (or not like my regex says) to always be attached to 'page'

2 Answers 2

3

Turn on multi-line option. then anchors ^$ will mean begining and end of line.
Instead, by default, ^$ mean begin/end of entire string.

In RxPal you can see Match at line-breaks (m) option is checked.

edit
If you are getting too much sub-expression data, then you should replace the
context into cluster groups. (..) -> (?: ..).
This is an 'extended' context.

If you can't do that, then just go with the data in group 0, which is the entire match, and ignore the rest. Not sure how to do this.

Sign up to request clarification or add additional context in comments.

2 Comments

For NSRegularExpression, that is the NSRegularExpressionAnchorsMatchLines option.
My other problem was that with the regex expression "^Page(\d+| \d+)(:| :|)$" for the string "page 15 :", will return me with 3 solutions "page 15 :", "15", and ":". I only want the first one. Like i said, it's like my AND is transformed in a OR/AND (both! weirdly)
1

As pointed out by sln (he solved that issue), you don't match anything in C because you have to turn multiline option on (with m), and you do match in regexpal because it is on.

Regarding your regex, it could be improved with ^(Page\s*\d+\s*:?\s*)$. The question mark means that the preceding character doesn't have to be here, the \s matches any whitespace-type character (whitespace, tab, etc).

Regarding your selecting issue, parenthesis in regex are what catches variables. So if you do (Page( \d+|\d+)) you'll have two different variables. What you wanted was (Page(?: \d+|\d+)), since (?: ) counts as parenthesis not assigning any value. But | aren't usually used when a simple ? does the trick.

1 Comment

Yes thank you! that did the trick. Now i have another problem but i will open another question i think ;) thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.