3

I am using the following extension method to get NSRange array of a substring:

extension String {
  func nsRangesOfString(findStr:String) -> [NSRange] {
    let ranges: [NSRange]
    do {
      // Create the regular expression.
      let regex = try NSRegularExpression(pattern: findStr, options: [])

      // Use the regular expression to get an array of NSTextCheckingResult.
      // Use map to extract the range from each result.
      ranges = regex.matches(in: self, options: [], range: NSMakeRange(0, self.characters.count)).map {$0.range}
    }
    catch {
      // There was a problem creating the regular expression
      ranges = []
    }
    return ranges
  }
}

However, I didn't realize why it doesn't work sometimes. Here are two similar cases, one works and the other doesn't:

That one works:

self(String):

"וצפן (קרי: יִצְפֹּ֣ן) לַ֭יְשָׁרִים תּוּשִׁיָּ֑ה מָ֝גֵ֗ן לְהֹ֣לְכֵי תֹֽם׃"

findStr:

"קרי:"

And that one doesn't:

self(String):

"לִ֭נְצֹר אָרְח֣וֹת מִשְׁפָּ֑ט וְדֶ֖רֶךְ חסידו (קרי: חֲסִידָ֣יו) יִשְׁמֹֽר׃"

findStr:

"קרי:"

(An alternate steady method would be an appropriate answer though.)

7
  • I'm sorry but would you kindly convert sample strings to English? Commented Sep 19, 2017 at 6:26
  • I could, but those aren't just random strings, those are the real strings being matched in my app, and i want to figure out why the second returns nothing. Commented Sep 19, 2017 at 6:29
  • is it mandatory to use regex to do such a task? Commented Sep 19, 2017 at 6:31
  • Negative. Another suggestion is welcome. Commented Sep 19, 2017 at 6:32
  • 1
    What is it that you are actually trying to do here? What is the end goal? Happy to suggest another way but I don’t know what your requirement is for this. Commented Sep 19, 2017 at 6:41

1 Answer 1

12

NSRange ranges are specified in terms of UTF-16 code units (which is what NSString uses internally), therefore the length must be self.utf16.count:

        ranges = regex.matches(in: self, options: [],
                               range: NSRange(location: 0, length: self.utf16.count))
            .map {$0.range}

In the case of your second string we have

let s2 = "לִ֭נְצֹר אָרְח֣וֹת מִשְׁפָּ֑ט וְדֶ֖רֶךְ חסידו (קרי: חֲסִידָ֣יו) יִשְׁמֹֽר׃"
print(s2.characters.count) // 46
print(s2.utf16.count)      // 74

and that's why the pattern is not found with your code.

Starting with Swift 4 you can compute a NSRange for the entire string also as

NSRange(self.startIndex..., in: self)
Sign up to request clarification or add additional context in comments.

3 Comments

Awesome. Is it safe to use utf16.count for all other types of string (including English)?
@sCha: s.utf16.count is the number of UTF-16 code units in a string, and always the same as (s as NSString).length, no matter what language. If you need a NSRange/NSString compatible count then that's the correct method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.