0

I have a problem with encoding when parsing web page with hpple in XCode.

- (void)loadTutorials {

NSURL *tutorialsUrl = [NSURL URLWithString:@"http://qrz.si/members/s55db/"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSASCIIStringEncoding error:nil];


TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];

NSString *tutorialsXpathQueryString = @"//td[@class='data']";
NSArray *tutorialsNodes = [tutorialsParsersearchWithXPathQuery:tutorialsXpathQueryString];


NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
    Tutorial *tutorial = [[Tutorial alloc] init];
    [newTutorials addObject:tutorial];


    for (TFHppleElement *child in element.children) {
        if ([child.tagName isEqualToString:@"img"]) {
           // NSLog([child objectForKey:@"src"]);
        } else if ([child.tagName isEqualToString:@"p"]) {
            //NSLog([[child firstChild] content]);
            tutorial.title = [[child firstChild] content];
        }
    }
}

_objects = newTutorials;
[self.tableView reloadData];
}

Page should be UTF-8 as the source points out, but I get wierd characters out.

How can I force change encoding of the data? Any help would be highly appreciated!

2 Answers 2

1
options:NSASCIIStringEncoding

is useless here, documentation points out that it's not the right way to go.

To set encoding, one must edit XPathQuery.m by Matt Gallagher, that I got in the same tutorial. Changes were visible, but nothing worked, as the site was clearly UTF-8 encoded.

The problems were server side and administrator offered me good old plain XML :)

Sign up to request clarification or add additional context in comments.

Comments

0

You are telling NSData object that the contents of the URL you are loading is ASCII not UTF8:

NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSASCIIStringEncoding error:nil];

Which should be

NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl options:NSUTF8StringEncoding error:nil];

2 Comments

I have tried with NSUTF8StringEncoding also, but encoding does not change, funny chars are still here :(
I've tried copying the whole table to link and if I parse this link, UTF-8 encoding is read properly. But if I parse the original site UTF-8 breaks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.