0

I have an interesting scenario and need to know if it can even be done in Cocoa.

I have tried and failed to find a hourly weather forecast api that is free and will let me see hour-by-hour forecast for a certain city/zipcode. So for an alternate approach what I am trying to do is to read the whole HTML page source and try to see if I can strip out the hourly weather portion so that I can use it in my iPhone app.

NSString *request = @"http://www.findlocalweather.com/hourly/il/chicago.html";

    NSURL *URL = [NSURL URLWithString:request];
    NSError *error;    
    NSString *HTML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];

    NSLog(@"HTML: %@", HTML);

If you go to the http://www.findlocalweather.com/hourly/il/chicago.html link you will see the grid hourly forecast. What I need to do is from that HTML source I need to read each date, clouds and temp lines and put that into arrays. e.g.

NSMutableArray1 will contain objects "AUG 05 9:00 AM, AUG 05 10:00 AM, AUG 05 11:00 AM ..."

NSMutableArray2 will contain objects "Mostly Cloudy, Mostly Sunny ..."

NSMutableArray3 will contain objects "73, 84, 76, 91 ...." (temp in degrees)

Can this be done? Anyone ever tried parsing a HTML page source string to get what you want out of it?

2
  • 1
    Yes, it can be done. But what would you do if "findlocalweather.com" decides to change their page layout? You would have to re-release your app. Commented Aug 5, 2012 at 14:58
  • I understand I am ok with that. Do let me know how I can do this? Commented Aug 5, 2012 at 15:04

2 Answers 2

1

You could do it easily with NSRegularExpression

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:EnterStringWithPatternHere options:0 error:NULL];
NSArray *matches = [regex matchesInString:webPage options:0 range:NSMakeRange(0, [webPage length])];

There are some nice tutorials how to use Regular Expressions (They are almost the same in most programming languages, but look after the specialties of NSRegEx)

Example: Parsing pdf links out of HMTL file.

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"href=\"([^<>]*).pdf\"" options:0 error:NULL];
NSArray *matches = [regex matchesInString:webPage options:0 range:NSMakeRange(0, [webPage length])];
for (int i=0; i<[matches count]; i++) {
    NSTextCheckingResult *match = [matches objectAtIndex:i];
 NSString *theFullString = [NSString stringWithFormat:@"%@.pdf",[webPage   substringWithRange:[match rangeAtIndex:1]]];}

I wrote this code a while ago. I would advise you experiment a bit and make use of NSLog or Breakpoints. This will help a lot. It takes some time to get into the RegEx Stuff but it works very good.

Sign up to request clarification or add additional context in comments.

5 Comments

@azzurrl, can you please give an example? I tried using NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"span class=\"copy\"" options:0 error:NULL]; NSArray *matches = [regex matchesInString:XML options:0 range:NSMakeRange(0, [HTML length])]; but that gives me an array with weird results i.e. i=0 ... <NSSimpleRegularExpressionCheckingResult: 0x6b4ac40>{10055, 17}{<NSRegularExpression: 0x6b4a700> span class="copy" 0x0} - with exact same lines over and over
You need to use NSTextCheckingResult - sorry should have mentioned this. I will edit the entry above
Also I'm not sure if you use the RegEx properly. Make use of the placeholder syntax like in the example above. But it could work. Just make some tests ;)
Thanks Azzurr1 but this will not work for me. Meaning I need to get to the text within an HTML tag. Meaning in this string <span class="copy">AUG 05<br>9:00 AM</span></td> <td bgcolor="#E6E6FF" align="center"><img src="findlocalweather.com/images/fcicons/mcloudy.gif" border="0" width="32" height="35"></td> ... I need to get just AUG 05 9:00 AM text which is between <span ...> </span> tags.
Of course this will work. Just look into the example. You need to specify your RegEx. Like you already said you want whats beetween the the span tag.(In fact I didn't do something different with the pdf files. I just wanted whats inside of <a href...> </a> You could do it like this: NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<span class=\"copy\">([^<>]*)</span>" options:0 error:NULL]; The ([^<>]*) stuff are placeholders which means that every letter or combination of sings is allowed expect < and >. Just try to understand what an RegEx does.
0

The XML Parser NSXMLParser can also be used for HTML. It uses delegate methods to process the elements of the document incrementally, so you have to build up the fields extracted. You would have to look at the structure of the HTML document returned to see what elements (tags) you would extract, and then put code into the delegate method didEndElement accordingly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.