1

I was trying to change:-

import urllib2 as urllib
... ...
file2 = urllib.urlopen(url2)
... ...
for line in file2:
    indexfrom2 = line.find('Mean Temperature')
    if indexfrom2 > -1:
        nxtLn = file2.next()
        nextLine = file2.next()
        indexfrom21 = nextLine.find('"nobr"')
        if indexfrom21 > -1:
            indexto21 = nextLine.find('</span>&nbsp;&deg;C</span>',indexfrom21)
        code2 = nextLine[indexfrom21+23:indexto21]
        print code2

and make it to look something like:-

class (...)  
def ....  
Temperature = parse( file2, '<span>Mean Temperature</span></td>', '<b>' )  

but I'm not sure how to do it. The above set of codes that I want to parse is a repeated for different values and I want to keep it short using parsing function so that it forms a set or a loop where i don't have to repeat all the codes again and again. [for every value (like mean temp, max temp, humidity, pressure, etc.), the code is repeated on my script, kinda looks unprofessional].

5
  • Personally I would use a regex for this task. Commented Jul 1, 2011 at 19:28
  • Also what on earth is the '<b>' in your "ideal" version. That's nowhere in the original code... Commented Jul 1, 2011 at 19:29
  • That </span>&nbsp;&deg;C</span> and <b> are the values on html code that are Indexed because I'm fetching a variable between these two values which lies somewhere in the html code (searched with find). Using that I pulled out some value using parsing. That's a general method. Now I need to make it look more professional with the method something like:- Temperature = parse( file2, '<span>Mean Temperature</span></td>', '<b>' ) Commented Jul 1, 2011 at 19:35
  • 1
    The original code does not contain '<b>', so it's a bit hard to determine what maps to what. It would be easier if you just kept things consistent and plugged in whatever value your original code is referencing. Commented Jul 1, 2011 at 19:40
  • @Chris:- thanks for the update, well, I'm new to python, really new. I'm new to ask web-help, so don't exactly know what should i ask and what not. But your feedback helped for niw :). Ok the url is:- wunderground.com/history/airport/EGPH/2010/6/30/… Commented Jul 1, 2011 at 19:46

1 Answer 1

1

You probably want to be using BeautifulSoup for this. It's the canonical way to parse HTML (and it works pretty well even in some horrible edge cases). If you continue with your current approach, you're relying on things like line numbers and so your code is pretty brittle in the face of minor document structure changes.

http://www.crummy.com/software/BeautifulSoup/

Sign up to request clarification or add additional context in comments.

7 Comments

Yes I looked over it, but as this is my summer project, I can't actually use any third party application or library.
If you're really parsing the weather underground files, why aren't you getting the data in a format appropriate for parsing? Screen scraping HTML from third-party sites is extremely bad practice - not only is it rude, but it breaks any time the website is updated. Since you're learning, don't learn to do things the wrong way.
The url for the comma delineated version of that file is here: wunderground.com/history/airport/EGPH/2010/6/30/…
thanks for the suggestion. Is it possible that I can use that to get the following using delineated version? I guess, I'll have to keep the url thingy as I need average units for any day with just min and max temp. as the out put you see below:-
Mean Temperature = 16 Degree C... Max Temperature = 21 Degree C... Min Temperature = 11 Degree C... Moisture:... Dew Point = 11 Degree C... Humidity:... Average Humidity = 71... Sea Level Pressure:... Sea Level Pressure = 1016.40 pHb... Using '...' as I don't know how to goto next line on adding comment ... Do you have any suggestion? As it's just that I'm using the codes with above method, which is giving me all this output as above using proper defining function like docs.python.org/tutorial/controlflow.html#defining-functions
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.