2

I am trying to parse XML file using NSXMLParser. Everything seems to work fine initially but the content result seems to be truncated off and got some weird result.

func parser(parser: NSXMLParser!, didStartElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!, attributes attributeDict: [NSObject : AnyObject]!) {
    if elementName == "title" {
        foundTitle = true
    }

    if elementName == "description" {
        foundDescription = true
    }
}

func parser(parser: NSXMLParser!, foundCharacters string: String!) {
    if (foundItem) {
        if foundTitle {
            println("Title: \(string)")
            foundTitle = false
        }
        else if foundDescription {
            println("Description: \(string)")
            foundDescription = false
        }
    }
}

The RSS feed I am testing on is This Day in Tech History (http://feedpress.me/ThisDayInTechHistory), and right now the first news have the following:

Title: IBM’s First Desktop Computer
Description: IBM introduces their System/23 Datamaster desktop computer...

Bur for my test result, this is what I got:

Title: IBM
Description: ’s First Desktop Computer
Description: July 28, 1981 IBM introduces their System/23 Datamaster desktop computer...

Note that the Title was truncated after the first ' and become a description! Is this a bug in NSXMLParser? Or what have I done wrong? Thanks!

3 Answers 3

2

Your guess is correct! The NSXMLParser assumes that the string has already been escaped, and will run into issues with characters including >, <, ', &, and \.

To do a global replace on a string, you can use the NSString method stringByReplacingOccurrencesOfString, like so:

let xml = "<desciption>Here's a malformed XML string. Ain't it ugly?</description>"
xml.stringByReplacingOccurrencesOfString("'", withString: "&quot;")

Which returns:

"<desciption>Here&quot;s a malformed XML string. Ain&quot;t it ugly?</description>"
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for helping, but I am not sure whether my guess is correct. I did a test and found that this is indeed what is returned and the title is split into title and description! I improved my code shown above.
Do you mean I load the XML into String first, replace all occurrences before passing to NSXMLParser? The problem with this is it takes 10 seconds to load the whole XML file and replaces all occurrences, and also xml.stringByReplacingOccurrencesOfString("'", withString: "&quot;") does not seem to do proper replacement, the output was corrupted.
2

Lim Thye Chean's answer is correct, but here's the problem in your code:

foundTitle = false

You see, foundCharacters stops at the first it encounters. Then you set foundTitle = false. So the remaining part of the string is being ignored when foundCharacters proceeds to find them (because foundTitle = false).

The best solution, IMHO, is to use these three delegate methods:

1) In didStartelement you should set a temporary variable such as var entryTitle = String() (so we're clearing out this string every time the parser didStartElement "title")

2) foundCharacters is called multiple times, stopping at many "uncommon" characters. We need to append each found string to our temporary variable. So inside foundCharacters we should say: entryTitle += string (to append to our variable all the little bits of string the parser finds separately)

3) Only when the parser didEndElement "title" should we assume that we have the "title" String completed. So it's here that we should say foundTitle = false, and also here that you should println(entryTitle)

I hope that helps. I've struggled a lot with the XMLParser, so I've written a short tutorial in understanding how it works: https://medium.com/@lucascerro/understanding-nsxmlparser-in-swift-xcode-6-3-1-7c96ff6c65bc

Comments

1

I found the issue. After getting the element "item", all the contained elements like "title" or "description" can appeared multiple times! So "IBM’s First Desktop Computer" will be split into 2 titles, and we need to combine them into some variables, and only construct the result when the element ends.

So new codes will work like this:

func parser(parser: NSXMLParser!, didStartElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!, attributes attributeDict: [NSObject : AnyObject]!) {
    element = elementName

    if element == "item" {
        isItem = true
        titleText = ""
        ...
    }
}

// Get element text

func parser(parser: NSXMLParser!, foundCharacters string: String!) {
    if isItem {
        if element == "title" {
            titleText += string
        }

        ...
    }
}

// Construct HTML when element end

func parser(parser: NSXMLParser!, didEndElement elementName: String!, namespaceURI: String!, qualifiedName qName: String!) {
    if elementName == "item" {
        html += "<b>\(titleText)</b>"
        ...
    }
}

This works!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.