5

Here's my project: I'm graphing weather data from WeatherBug using RRDTool. I need a simple, efficient way to download the weather data from WeatherBug. I was using a terribly inefficient bash-script-scraper but moved on to BeautifulSoup. The performance is just too slow (it's running on a Raspberry Pi) so I need to use LXML.

What I have so far:

from lxml import etree
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")

But I get an error message. Weather.xml is this:

<?xml version="1.0" encoding="UTF-8"?>

<aws:weather xmlns:aws="http://www.aws.com/aws">
  <aws:api version="2.0"/>
  <aws:WebURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&amp;Units=0&amp;stat=TNKCN</aws:WebURL>
  <aws:InputLocationURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&amp;Units=0</aws:InputLocationURL>
  <aws:ob>
    <aws:ob-date>
      <aws:year number="2013"/>
      <aws:month number="1" text="January" abbrv="Jan"/>
      <aws:day number="11" text="Friday" abbrv="Fri"/>
      <aws:hour number="10" hour-24="22"/>
      <aws:minute number="26"/>
      <aws:second number="00"/>
      <aws:am-pm abbrv="PM"/>
      <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
    </aws:ob-date>
    <aws:requested-station-id/>
    <aws:station-id>TNKCN</aws:station-id>
    <aws:station>Tunkhannock HS</aws:station>
    <aws:city-state zipcode="18657">Tunkhannock, PA</aws:city-state>
    <aws:country>USA</aws:country>
    <aws:latitude>41.5663871765137</aws:latitude>
    <aws:longitude>-75.9794464111328</aws:longitude>
    <aws:site-url>http://www.tasd.net/highschool/index.cfm</aws:site-url>
    <aws:aux-temp units="&amp;deg;F">-100</aws:aux-temp>
    <aws:aux-temp-rate units="&amp;deg;F">0</aws:aux-temp-rate>
    <aws:current-condition icon="http://deskwx.weatherbug.com/images/Forecast/icons/cond013.gif">Cloudy</aws:current-condition>
    <aws:dew-point units="&amp;deg;F">40</aws:dew-point>
    <aws:elevation units="ft">886</aws:elevation>
    <aws:feels-like units="&amp;deg;F">41</aws:feels-like>
    <aws:gust-time>
      <aws:year number="2013"/>
      <aws:month number="1" text="January" abbrv="Jan"/>
      <aws:day number="11" text="Friday" abbrv="Fri"/>
      <aws:hour number="12" hour-24="12"/>
      <aws:minute number="18"/>
      <aws:second number="00"/>
      <aws:am-pm abbrv="PM"/>
      <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
    </aws:gust-time>
    <aws:gust-direction>NNW</aws:gust-direction>
    <aws:gust-direction-degrees>323</aws:gust-direction-degrees>
    <aws:gust-speed units="mph">17</aws:gust-speed>
    <aws:humidity units="%">98</aws:humidity>
    <aws:humidity-high units="%">100</aws:humidity-high>
    <aws:humidity-low units="%">61</aws:humidity-low>
    <aws:humidity-rate>3</aws:humidity-rate>
    <aws:indoor-temp units="&amp;deg;F">77</aws:indoor-temp>
    <aws:indoor-temp-rate units="&amp;deg;F">-1.1</aws:indoor-temp-rate>
    <aws:light>0</aws:light>
    <aws:light-rate>0</aws:light-rate>
    <aws:moon-phase moon-phase-img="http://api.wxbug.net/images/moonphase/mphase01.gif">0</aws:moon-phase>
    <aws:pressure units="&quot;">30.09</aws:pressure>
    <aws:pressure-high units="&quot;">30.5</aws:pressure-high>
    <aws:pressure-low units="&quot;">30.08</aws:pressure-low>
    <aws:pressure-rate units="&quot;/h">-0.01</aws:pressure-rate>
    <aws:rain-month units="&quot;">0.11</aws:rain-month>
    <aws:rain-rate units="&quot;/h">0</aws:rain-rate>
    <aws:rain-rate-max units="&quot;/h">0.12</aws:rain-rate-max>
    <aws:rain-today units="&quot;">0.09</aws:rain-today>
    <aws:rain-year units="&quot;">0.11</aws:rain-year>
    <aws:temp units="&amp;deg;F">41</aws:temp>
    <aws:temp-high units="&amp;deg;F">42</aws:temp-high>
    <aws:temp-low units="&amp;deg;F">29</aws:temp-low>
    <aws:temp-rate units="&amp;deg;F/h">-0.9</aws:temp-rate>
    <aws:sunrise>
      <aws:year number="2013"/>
      <aws:month number="1" text="January" abbrv="Jan"/>
      <aws:day number="11" text="Friday" abbrv="Fri"/>
      <aws:hour number="7" hour-24="07"/>
      <aws:minute number="29"/>
      <aws:second number="53"/>
      <aws:am-pm abbrv="AM"/>
      <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
    </aws:sunrise>
    <aws:sunset>
      <aws:year number="2013"/>
      <aws:month number="1" text="January" abbrv="Jan"/>
      <aws:day number="11" text="Friday" abbrv="Fri"/>
      <aws:hour number="4" hour-24="16"/>
      <aws:minute number="54"/>
      <aws:second number="19"/>
      <aws:am-pm abbrv="PM"/>
      <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
    </aws:sunset>
    <aws:wet-bulb units="&amp;deg;F">40.802</aws:wet-bulb>
    <aws:wind-speed units="mph">3</aws:wind-speed>
    <aws:wind-speed-avg units="mph">1</aws:wind-speed-avg>
    <aws:wind-direction>S</aws:wind-direction>
    <aws:wind-direction-degrees>163</aws:wind-direction-degrees>
    <aws:wind-direction-avg>SE</aws:wind-direction-avg>
  </aws:ob>
</aws:weather>

I used http://www.xpathtester.com/test to test my xpath and it worked there. But I get the error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2043, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:47570)
  File "xpath.pxi", line 376, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:118247)
  File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
  File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116728)
lxml.etree.XPathEvalError: Undefined namespace prefix

This is all very new to me -- Python, XML, and LXML. All I want is the observed time and the temperature.

Do my problems have anything to do with that aws: prefix in front of everything? What does that even mean?

Any help you can offer is greatly appreciated!

2 Answers 2

8

The problem has all "to do with that aws: prefix in front of everything"; it is a namespace prefix which you have to define. This is easily achievable, as in:

print doc.xpath('//aws:weather/aws:ob/aws:temp', 
                namespaces={'aws': 'http://www.aws.com/aws'})[0].text

The need for this mapping between the namespace prefix to a value is documented at http://lxml.de/xpathxslt.html.

Sign up to request clarification or add additional context in comments.

Comments

6

Try something like this:

from lxml import etree
ns = etree.FunctionNamespace("http://www.aws.com/aws")
ns.prefix = "aws"
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")[0].text

See this link: http://lxml.de/extensions.html

1 Comment

I prefer this solution because I don't have to pass the namespace mapping every time I call the the doc.xpath() method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.