Reading and getting values from a HTML string

Question

I have an interesting scenario and need to know if it can even be done in Cocoa.

I have tried and failed to find a hourly weather forecast api that is free and will let me see hour-by-hour forecast for a certain city/zipcode. So for an alternate approach what I am trying to do is to read the whole HTML page source and try to see if I can strip out the hourly weather portion so that I can use it in my iPhone app.

NSString *request = @"http://www.findlocalweather.com/hourly/il/chicago.html";

    NSURL *URL = [NSURL URLWithString:request];
    NSError *error;    
    NSString *HTML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];

    NSLog(@"HTML: %@", HTML);

If you go to the http://www.findlocalweather.com/hourly/il/chicago.html link you will see the grid hourly forecast. What I need to do is from that HTML source I need to read each date, clouds and temp lines and put that into arrays. e.g.

NSMutableArray1 will contain objects "AUG 05 9:00 AM, AUG 05 10:00 AM, AUG 05 11:00 AM ..."

NSMutableArray2 will contain objects "Mostly Cloudy, Mostly Sunny ..."

NSMutableArray3 will contain objects "73, 84, 76, 91 ...." (temp in degrees)

Can this be done? Anyone ever tried parsing a HTML page source string to get what you want out of it?

Yes, it can be done. But what would you do if "findlocalweather.com" decides to change their page layout? You would have to re-release your app. — James Webster
– James Webster, Commented Aug 5, 2012 at 14:58
I understand I am ok with that. Do let me know how I can do this? — Sam B
– Sam B, Commented Aug 5, 2012 at 15:04

arnoapp · Accepted Answer · 2012-08-05 22:13:50Z

1

You could do it easily with NSRegularExpression

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:EnterStringWithPatternHere options:0 error:NULL];
NSArray *matches = [regex matchesInString:webPage options:0 range:NSMakeRange(0, [webPage length])];

There are some nice tutorials how to use Regular Expressions (They are almost the same in most programming languages, but look after the specialties of NSRegEx)

Example: Parsing pdf links out of HMTL file.

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"href=\"([^<>]*).pdf\"" options:0 error:NULL];
NSArray *matches = [regex matchesInString:webPage options:0 range:NSMakeRange(0, [webPage length])];
for (int i=0; i<[matches count]; i++) {
    NSTextCheckingResult *match = [matches objectAtIndex:i];
 NSString *theFullString = [NSString stringWithFormat:@"%@.pdf",[webPage   substringWithRange:[match rangeAtIndex:1]]];}

I wrote this code a while ago. I would advise you experiment a bit and make use of NSLog or Breakpoints. This will help a lot. It takes some time to get into the RegEx Stuff but it works very good.

edited Aug 5, 2012 at 22:13

answered Aug 5, 2012 at 15:38

arnoapp

2,4865 gold badges42 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sam B Over a year ago

@azzurrl, can you please give an example? I tried using NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"span class=\"copy\"" options:0 error:NULL]; NSArray *matches = [regex matchesInString:XML options:0 range:NSMakeRange(0, [HTML length])]; but that gives me an array with weird results i.e. i=0 ... <NSSimpleRegularExpressionCheckingResult: 0x6b4ac40>{10055, 17}{<NSRegularExpression: 0x6b4a700> span class="copy" 0x0} - with exact same lines over and over

arnoapp Over a year ago

You need to use NSTextCheckingResult - sorry should have mentioned this. I will edit the entry above

arnoapp Over a year ago

Also I'm not sure if you use the RegEx properly. Make use of the placeholder syntax like in the example above. But it could work. Just make some tests ;)

Sam B Over a year ago

Thanks Azzurr1 but this will not work for me. Meaning I need to get to the text within an HTML tag. Meaning in this string AUG 05 9:00 AM</td> <td bgcolor="#E6E6FF" align="center"><img src="findlocalweather.com/images/fcicons/mcloudy.gif" border="0" width="32" height="35"></td> ... I need to get just AUG 05 9:00 AM text which is between tags.

arnoapp Over a year ago

Of course this will work. Just look into the example. You need to specify your RegEx. Like you already said you want whats beetween the the span tag.(In fact I didn't do something different with the pdf files. I just wanted whats inside of <a href...> </a> You could do it like this:

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<span class=\"copy\">([^<>]*)</span>" options:0 error:NULL];

The ([^<>]*) stuff are placeholders which means that every letter or combination of sings is allowed expect < and >. Just try to understand what an RegEx does.

SPA · Accepted Answer · 2012-08-05 20:49:57Z

0

The XML Parser NSXMLParser can also be used for HTML. It uses delegate methods to process the elements of the document incrementally, so you have to build up the fields extracted. You would have to look at the structure of the HTML document returned to see what elements (tags) you would extract, and then put code into the delegate method didEndElement accordingly.

answered Aug 5, 2012 at 20:49

SPA

1,2898 silver badges13 bronze badges

Collectives™ on Stack Overflow

Reading and getting values from a HTML string

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related