0

So I get a string result from a system like this, which I have to capture all the hex parts, excluding the 0x:

[System Info] 2.20.02 2.20.02 - Extended Data: 
0xAC, 0x4D, 0xDE, 0x04, 0xA4, 0x10, 0x73, 0x89, 0xDF, 0xFF, 0x01, 0x01, 0x01, 0xDF, 0x5A, 0x10, 
0x34, 0x37, 0x35, 0x36, 0x33, 0xC1, 0x10, 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x37, 0x38, 0x31, 0x32, 
0x9F, 0xDD, 0x01, 0xB5, 0x42, 0x03, 0x45, 0x56, 0x33, 0x2F, 0x02, 0x06, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x15, 0xA3, 0x21, 0x03, 0x09, 0x51, 0x09, 0x9A, 0xE5, 0x16, 0x12, 0x21, 0x9F, 0x34, 0x03, 
0x03, 0x1E, 0x03, 0xCE, 0x04, 0x00, 0x12, 0x00, 0x00, 0xDF, 0xFF, 0x02, 0x01, 0x1A,

I have created a function which can help me extract substrings into an array:

+ (NSArray *) regexPattern:(NSString *)pattern toExtract:(NSString *)string{
    NSError *error;
    NSRegularExpression * regexp = [NSRegularExpression regularExpressionWithPattern:pattern
                                    options:NSRegularExpressionCaseInsensitive error:&error];
    if (error == nil) { return nil; }
    NSMutableArray * matches = [[regexp matchesInString:string options:0 range:NSMakeRange(0, [string length])] mutableCopy];
    [matches removeObjectAtIndex:0]; // because it contains all the string.
    NSMutableArray * result = [[NSMutableArray alloc] init];
    for (NSTextCheckingResult * match in matches) {
        [result addObject:[string substringWithRange:[match range]]];
    }
    [matches release];
    return result;
}

But now the problem is the regex. I have tried to use capture group () to capture only the hex part using this pattern: 0x(..),. This pattern capture the whole 0xFD, instead of just FD. If I use ([\dA-F]){2}, I can get all the hex, but I also capture 20 and 02 from 2.20.02 2.20.02, which I don't want to. Some website told me that I will only get the data between the capture brackets, but that's not the case. Can somebody help? Thanks.

2 Answers 2

3

In short, don't. Regular expressions are really useful, but not for such a well defined, simple, set of input.

See the top answer here for an explanation: RegEx match open tags except XHTML self-contained tags

Instead, use NSScanner. It is quite adept at scanning hex strings and skipping characters as needed. It'll be faster and more sane (the problem with regular expressions is that the fuzzy nature of the matching yields a parser that can often be easily spoofed, confused, or hacked by purposefully mal-constructed input).

This is a pretty good starting point:

Objective-C parse hex string to integer

I'd start by finding the "Extended Data:", then use the scanner to skip the 0x, then scan to parse a hex #, then use the scanner to skip the ", 0x", etc...

Sign up to request clarification or add additional context in comments.

3 Comments

did you mean "regular expressions are really useful, especially for such a well defined, simple set of input" ?
@ChenLiYong Nope. Regular expressions are a giant pain in the ass in all contexts(1). Where they are very useful is in dealing with relatively unstructured input where you need to fuzzy match to pull subsets of data. For well defined structured input you should pretty much never use a regex vs. something as simple as NSScanner, state machine, or a proper parser. (1)I've used, and continue to use, regular expressions often. But not for tasks like this.
Oh I see. I'll start to read about the NSScanner then. I've never heard of it before. I think from your explanation, NSScanner works similar to state machine or something. Thanks.
1

You can use 0x(..), as your regular expression, but when you are iterating through the matches, instead of using substringWithRange:[match range] in [result addObject:[string substringWithRange:[match range]]]; which adds the entire matching string portion, you need to just add the first group (portion in parenthesis)

You could do it like this

for (NSTextCheckingResult * match in matches) {
    NSRange groupRange = [match rangeAtIndex:1];
    [result addObject:[string substringWithRange:groupRange]];
}

1 Comment

Wait, so you're saying that the capturing groups in the result is present on the rangeAtIndex ? So if I have 10 capturing groups in one regex syntax, I will be able to get each of the capturing group's content using that? Oh I see!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.