14

I have a string "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography" I want to split this string using the regex expression [0-9][0-9][0-9][A-Z][A-Z][A-Z] so that the function returns the array:

Array = 
["323 ECO Economics Course ", "451 ENG English Course",  "789 Mathematical Topography"]

How would I go about doing this using swift?

Edit My question is different than the one linked to. I realize that you can split a string in swift using myString.components(separatedBy: "splitting string") The issue is that that question doesn't address how to make the splitting string a regex expression. I tried using mystring.components(separatedBy: "[0-9][0-9][0-9][A-Z][A-Z][A-Z]", options: .regularExpression) but that didn't work.

How can I make the separatedBy: portion a regular expression?

2
  • 1
    Perhaps you are looking at this wrong. Instead of trying to find a fancy way to "split" a string using a regex, why not simply use the NSRegularExpression class and its matches function to get all of the matches of your regex? Commented Feb 27, 2017 at 1:55
  • The answer already done below is a great answer, however, after reading your question, I thought you might find this useful. This is a Regex class written in Swift that can be dropped into your project. I've used it in multiple projects with great ease and success. gist.github.com/ningsuhen/dc6e589be7f5a41e7794 Commented Feb 27, 2017 at 2:06

4 Answers 4

12

You can use regex "\\b[0-9]{1,}[a-zA-Z ]{1,}" and this extension from this answer to get all ranges of a string using literal, caseInsensitive or regularExpression search:

extension StringProtocol {
    func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] {
        var result: [Range<Index>] = []
        var startIndex = self.startIndex
        while startIndex < endIndex,
            let range = self[startIndex...].range(of: string, options: options) {
                result.append(range)
                startIndex = range.lowerBound < range.upperBound ? range.upperBound :
                    index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
        }
        return result
    }
}

let inputString = "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography"

let courses = inputString.ranges(of: "\\b[0-9]{1,}[a-zA-Z ]{1,}", options: .regularExpression).map { inputString[$0].trimmingCharacters(in: .whitespaces) }

print(courses)   //   ["323 ECO Economics Course", "451 ENG English Course", "789 Mathematical Topography"]
Sign up to request clarification or add additional context in comments.

2 Comments

If your courses codes always have 3 digits and your string have at least 3 characters, you can use regex "\\b[0-9]{3}[a-zA-Z ]{3,}"
This is a nice clean solution. I like how you build an array of ranges and then use map to extract the substrings from the original string. Very elegant use of functional programming. (Voted)
6

Swift doesn't have native regular expressions as of yet. But Foundation provides NSRegularExpression.

import Foundation

let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"

let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])

// NSRegularExpression works with objective-c NSString, which are utf16 encoded
let matches = regex.matches(in: toSearch, range: NSMakeRange(0, toSearch.utf16.count))

// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
  let range = current.rangeAt(0)
  let start = String.UTF16Index(range.location)
  // if there's a next, use it's starting location as the ending of our match
  // otherwise, go to the end of the searched string
  let end = next.map { $0.rangeAt(0) }.map { String.UTF16Index($0.location) } ?? String.UTF16Index(toSearch.utf16.count)

  return String(toSearch.utf16[start..<end])!
}

dump(results)

Running this will output

▿ 3 elements
  - "323 ECO Economics Course "
  - "451 ENG English Course "
  - "789 MAT Mathematical Topography"

1 Comment

+1 for for supplying utf-16 NSString encoding length. Worked for me when just count truncates the result by the grapheme length less the code-point length (which are usually the same in some languages.)
2

I needed something like this and should work more like JS String.prototype.split(pat: RegExp) or Rust's String.splitn(pat: Pattern<'a>) but with Regex. I ended up with this

extension NSRegularExpression {
    convenience init(_ pattern: String) {...}
    
    
    /// An array of substring of the given string, separated by this regular expression, restricted to returning at most n items.
    /// If n substrings are returned, the last substring (the nth substring) will contain the remainder of the string.
    /// - Parameter str: String to be matched
    /// - Parameter n: If `n` is specified and n != -1, it will be split into n elements else split into all occurences of this pattern
    func splitn(_ str: String, _ n: Int = -1) -> [String] {
        let range = NSRange(location: 0, length: str.utf8.count)
        let matches = self.matches(in: str, range: range);
        
        var result = [String]()
        if (n != -1 && n < 2) || matches.isEmpty  { return [str] }
        
        if let first = matches.first?.range {
            if first.location == 0 { result.append("") }
            if first.location != 0 {
                let _range = NSRange(location: 0, length: first.location)
                result.append(String(str[Range(_range, in: str)!]))
            }
        }
        
        for (cur, next) in zip(matches, matches[1...]) {
            let loc = cur.range.location + cur.range.length
            if n != -1 && result.count + 1 == n {
                let _range = NSRange(location: loc, length: str.utf8.count - loc)
                result.append(String(str[Range(_range, in: str)!]))
                return result
                
            }
            let len = next.range.location - loc
            let _range = NSRange(location: loc, length: len)
            result.append(String(str[Range(_range, in: str)!]))
        }
        
        if let last = matches.last?.range, !(n != -1 && result.count >= n) {
            let lastIndex = last.length + last.location
            if lastIndex == str.utf8.count { result.append("") }
            if lastIndex < str.utf8.count {
                let _range = NSRange(location: lastIndex, length: str.utf8.count - lastIndex)
                result.append(String(str[Range(_range, in: str)!]))
            }
        }
        
        return result;
    }
    
}

Passes the following tests

func testRegexSplit() {
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love"), ["My", "Love"])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love . "), ["My", "Love", ""])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love"), ["", "My", "Love"])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love . "), ["", "My", "Love", ""])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX"), ["", "My", "", "Love", ""])
    }



func testRegexSplitWithN() {
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 1), ["xXMyxXxXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", -1), ["", "My", "", "Love", ""])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 2), ["", "MyxXxXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 3), ["", "My", "xXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 4), ["", "My", "", "LovexX"])
    }

func testNoMatches() {
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 1), ["MyLove"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove"), ["MyLove"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 3), ["MyLove"])
    }

2 Comments

I found this crashed for me if I provided a string that had 0 matches to the pattern. I ended up using this instead: gist.github.com/hcrub/218e1d25f1659d00b7f77aebfcebf15a
@Patrick I have fixed this and also added test cases for it
0

Even if the question is a bit older. This variant uses the RegexComponent from iOS 16 and macOS 13, which will allow you something like "a b c".split(regex: /\s+/):

public extension StringProtocol {
    func split<R>(
        regex: R, maxSplits: Int = Int.max, omittingEmptySubsequences: Bool = true
    ) -> [Substring] where R: RegexComponent, SubSequence == Substring {
        guard maxSplits > 0 else { return [ self[startIndex...] ] }
        var startIndex = startIndex
        var result = [Substring]()

        for m in matches(of: regex) {
            let substring = self[startIndex..<m.range.lowerBound]
            
            if !omittingEmptySubsequences || !substring.isEmpty {
                result.append(substring)
            }
            startIndex = m.range.upperBound
            if result.count >= maxSplits {
                break
            }
        }
        if startIndex < endIndex {
            result.append(self[startIndex...])
        }
        return result
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.