0

I have an Android application in which I'm trying to parse a small sized XML response coming from server. I have simply just used a regex for doing that, but now I'm facing a problem dealing with optional XML-tags that may or may not be present in the response's payload. My regex is declared as

final String regex = "^(?=.*<x>(.+)</x>)(?=.*<r>(.+)</r>)?(?=.*<e>(.+)</e>)" +
            "(?=.*<h>(.+)</h>)(?=.*<y>(.+)</y>)(?=.*<n>(.+)</n>).*$";

Note the question mark after the second look ahead group to have the < r > tag be optional. It is causing me an exception when compiling with Pattern.compile:

Pattern p = Pattern.compile(regex);
...
12-19 10:19:21.257: E/AndroidRuntime(2342): Caused by: java.util.regex.PatternSyntaxException: Syntax error U_REGEX_RULE_SYNTAX near index 36:
12-19 10:19:21.257: E/AndroidRuntime(2342): ^(?=.*<x>(.+)</x>)(?=.*<r>(.+)</r>)?(?=.*<e>(.+)</e>)(?=.*<h>(.+)</h>)(?=.*<y>(.+)</y>)(?=.*<n>(.+)</n>).*$
12-19 10:19:21.257: E/AndroidRuntime(2342):                                     ^
12-19 10:19:21.257: E/AndroidRuntime(2342):     at java.util.regex.Pattern.compileImpl(Native Method)
12-19 10:19:21.257: E/AndroidRuntime(2342):     at java.util.regex.Pattern.compile(Pattern.java:400)
12-19 10:19:21.257: E/AndroidRuntime(2342):     at java.util.regex.Pattern.<init>(Pattern.java:383)
12-19 10:19:21.257: E/AndroidRuntime(2342):     at java.util.regex.Pattern.compile(Pattern.java:374)

Without the ?-quantifier it compiles just fine.

I tried also using the same pattern on Windows with desktop java (jdk 1.6.0_24) and it worked as it should (and even parsed my test payload correctly). Are there some known limitations with the native regex implementation on Android or am I just missing something here? I already read the Android documentation regarding Pattern but to no avail. Any help would be appreciated, I wouldn't like to resort to having to use a dedicated XML parser for something as simple as this.

1
  • I'm not sure about quantified look aheads (they might not be supported), so you might try and change that part to (?=(?:.*<r>(.+)</r>)?) i.e. use an inner non-capturing quantified group. Commented Dec 19, 2011 at 9:19

1 Answer 1

1

You usually can't make a lookahead optional by appending a ?. This is a syntax error in some regex dialects (because a lookahead assertion doesn't match anything that could be quantified by the ?).

But you can wrap the lookahead in an optional non-capturing group:

final String regex = "^(?=.*<x>(.+)</x>)(?:(?=.*<r>(.+)</r>))?(?=.*<e>(.+)</e>)" +
            "(?=.*<h>(.+)</h>)(?=.*<y>(.+)</y>)(?=.*<n>(.+)</n>).*$";
Sign up to request clarification or add additional context in comments.

2 Comments

This may fix the syntax error, but I am not sure if it makes sense at all to have a lookahead that is optional?
Yes, this did the trick. Thank you! And yes, I am aware that using this approach for parsing xml is somewhat unconventional. I was just a bit curious about how to pull this off using regexes. Moreover I was wondering why this would not work on Android having used the same thing at least with python and desktop Java, but as said above, dialects can differ.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.