1

Is it possible to take just a part of the HTML string that I have obtained passing the URL?

Example code below:

  let myURLString = "https://myUrl.something"
        guard let myURL = NSURL(string: myURLString) else {
            print("Error: \(myURLString) doesn't seem to be a valid URL")
            return
        }
        do {
            let myHTMLString = try String(contentsOf: myURL as URL)
            let htmlString = String(myHTMLString)
            print("HTML: \(myHTMLString)")
        } catch let error as NSError {
            print("Error: \(error)")
        }

I want to take what's inside the tag <h3 class="post-title"> to </h3>.

I know that I should use the regular expressions but I don't really know how to set it in the right way. I tried something like this:

  let myURLString = "https://www.fvgtech.it/category/podcast/"
        guard let myURL = NSURL(string: myURLString) else {
            print("Error: \(myURLString) doesn't seem to be a valid URL")
            return
        }
        do {
            let myHTMLString = try String(contentsOf: myURL as URL)
            let htmlString = String(myHTMLString)

            if let match = htmlString.range(of: "(<h3.+)", options: .regularExpression) {
                print("Found",htmlString.substring(with: match))
            }


            print("HTML: \(myHTMLString)")
        } catch let error as NSError {
            print("Error: \(error)")
        }

But it's printing just <h3 class="post-title"> and not what's in the middle. Thanks in advance!

4
  • @a.masri thanks for the reply! Actually if I'll put what you texted it's taking the first bracket, but if I'll modify <h3[^>].*> it will show to me again just <h3 class="post-title"> and not what's between the <h3 and the \h3> Commented May 6, 2018 at 19:29
  • look this regex regex101.com/r/CV67Yl/1 and check result in group 1 this is what you wan Commented May 6, 2018 at 19:38
  • I checked it but Xcode it's returning me "Invalid escape sequence in literal" Commented May 6, 2018 at 19:43
  • Ok, try this String(htmlString.filter { !" \n\t\r".contains($0) }).range(of: "<h3.*?>(.+)((.*)+(.+))+</h3>", options: .regularExpression) But I do not advise you to use this method because it will take a very long time to better use this library github.com/scinfu/SwiftSoup Commented May 6, 2018 at 20:30

1 Answer 1

3

Just we need to search all substrings between start String and end String See Extension of String

 let myURLString = "https://www.fvgtech.it/category/podcast/"
    guard let myURL = NSURL(string: myURLString) else {
        print("Error: \(myURLString) doesn't seem to be a valid URL")
        return
    }
    do {
        let myHTMLString = try String(contentsOf: myURL as URL)
        let htmlString = String(myHTMLString)
        print(htmlString.allStringsBetween("<h3 class=\"post-title\">", andString: "</h3>"))

    } catch let error as NSError {
        print("Error: \(error)")
    }

Extension for String

extension String{

  func allStringsBetween(start: String, end: String) -> [Any] {
            var strings = [Any]()
            var startRange: NSRange = (self as NSString).range(of: start)

            while true {
                if startRange.location != NSNotFound {
                    var targetRange = NSRange()
                    targetRange.location = startRange.location + startRange.length
                    targetRange.length = self.count - targetRange.location
                    let endRange: NSRange = (self as NSString).range(of: end, options: [], range: targetRange)
                    if endRange.location != NSNotFound {
                        targetRange.length = endRange.location - targetRange.location
                        strings.append((self as NSString).substring(with: targetRange))
                        var restOfString =  NSRange()
                        restOfString.location = endRange.location + endRange.length
                        restOfString.length = self.count - restOfString.location
                        startRange = (self as NSString).range(of: start, options: [], range: restOfString)
                    }
                    else {
                        break
                    }
                }
                else {
                    break
                }

            }
            return strings
        }

    }
Sign up to request clarification or add additional context in comments.

5 Comments

But with this I'm just grabbing what's inside <h3 > with result as " class="post-title" " Actually I have something like this <h3 class="post-title"> <a href="fvgtech.it/…" rel="bookmark"> 18 &#8211; Fake News Festival e cosa sono le bufale. Ospite Gabriele Franco </a> </h3> and I want to grab it what's between the two <h3> ... </h3>
Do you want to grab all what between h3 tags
Yes, so this: <a href="fvgtech.it/…; rel="bookmark"> 18 &#8211; Fake News Festival e cosa sono le bufale. Ospite Gabriele Franco </a>
@PietroMessineo code update check it and tell me results
That's great ! Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.