9

Please can somebody show me a simple example of parsing some HTML using libxml.

#import <libxml2/libxml/HTMLparser.h>

NSString *html = @"<ul>"
    "<li><input type=\"image\" name=\"input1\" value=\"string1value\" /></li>"
    "<li><input type=\"image\" name=\"input2\" value=\"string2value\" /></li>"
  "</ul>"
  "<span class=\"spantext\"><b>Hello World 1</b></span>"
  "<span class=\"spantext\"><b>Hello World 2</b></span>";

1) Say I want to parse the value of the input whose name = input2.

Should output "string2value".

2) Say I want to parse the inner contents of each span tag whose class = spantext.

Should output: "Hello World 1" and "Hello World 2".

5
  • libxml is for xml parsing and for that you need to see TouchXML. Commented Jun 11, 2010 at 7:56
  • Even though I'm using HTMLparser.h? I'll take a look at TouchXML thanks. Commented Jun 11, 2010 at 8:37
  • 2
    @Ayaz: libxml2 supports HTML4 parsing. From the sparse documentation of TouchXML, it seems it doesn't, so it's not appropriate in this instance. Commented Jun 11, 2010 at 10:17
  • touchXML contains CXMLDocumentTidyHTML property in their CXMLDocument.h file, inferring from that this problem could be solved using touchXML also you can see KissXML which is inspired from touchXML. For pure HTML parser I just found a link touchtank.wordpress.com/element-parser .. see if it fits for your needs Commented Jun 11, 2010 at 10:29
  • github.com/zootreeves/Objective-C-HMTL-Parser Did what I wanted, thanks v much for your help. Commented Jun 11, 2010 at 14:12

2 Answers 2

19

I used Ben Reeves' HTML Parser to achieve what I wanted:

NSError *error = nil;
NSString *html = 
    @"<ul>"
        "<li><input type='image' name='input1' value='string1value' /></li>"
        "<li><input type='image' name='input2' value='string2value' /></li>"
    "</ul>"
    "<span class='spantext'><b>Hello World 1</b></span>"
    "<span class='spantext'><b>Hello World 2</b></span>";
HTMLParser *parser = [[HTMLParser alloc] initWithString:html error:&error];

if (error) {
    NSLog(@"Error: %@", error);
    return;
}

HTMLNode *bodyNode = [parser body];

NSArray *inputNodes = [bodyNode findChildTags:@"input"];

for (HTMLNode *inputNode in inputNodes) {
    if ([[inputNode getAttributeNamed:@"name"] isEqualToString:@"input2"]) {
        NSLog(@"%@", [inputNode getAttributeNamed:@"value"]); //Answer to first question
    }
}

NSArray *spanNodes = [bodyNode findChildTags:@"span"];

for (HTMLNode *spanNode in spanNodes) {
    if ([[spanNode getAttributeNamed:@"class"] isEqualToString:@"spantext"]) {
        NSLog(@"%@", [spanNode allContents]); //Answer to second question
    }
}

[parser release];
Sign up to request clarification or add additional context in comments.

3 Comments

I know this is old, but I'm pretty sure he wants "allContents" and not "rawContents"
@StuR does his library work for iphone development io6 as well?
@Odelya I should think so, although I haven't tested it. You may need to set a no arc compiler flag.
1

As Vladimir said, for the second point it's important to replace rawContents with Contents. rawContents will print the complete raw text node, i.e.:

<span class='spantext'><b>Hello World 1</b></span>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.