Python XML parsing from website

Question

I am trying to Parse from a website. I am stuck. I will provide the XML below. It is coming from a webiste. I have two questions. What is the best way to read xml from a website, and then I am having trouble digging into the xml to get the rate I need.

The figure I need back is Base:OBS_VALUE 0.12

What I have so far:

from xml.dom import minidom
import urllib


document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0]

ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0]

for line in ff_series:
    price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data
    print(price)

XML code from webiste:

-<Header> <ID>FFD</ID>
 <Test>false</Test> 
 <Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared>
 <Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> 
<Contact>   
<Name xml:lang="en">Public Information Web Team</Name> <Email>[email protected]</Email>  
</Contact> 
</Sender> 
<!--ReportingBegin></ReportingBegin-->
</Header> 
<ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> 
<ffbase:Key> 
<base:FREQ>D</base:FREQ> 
<base:RATE>FF</base:RATE>
<base:MATURITY>O</base:MATURITY> 
<ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> 
</ffbase:Key> 
<ff:Obs OBS_CONF="F" OBS_STATUS="A">
<base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD>
<base:OBS_VALUE>0.12</base:OBS_VALUE>

Jean-François Bourdon · Accepted Answer · 2018-04-24 23:10:54Z

8

If you wanted to stick with xml.dom.minidom, try this...

from xml.dom import minidom
import urllib

url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
xml_str = urllib.urlopen(url_str).read()
xmldoc = minidom.parseString(xml_str)

obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE')
# prints the first base:OBS_VALUE it finds
print obs_values[0].firstChild.nodeValue

# prints the second base:OBS_VALUE it finds
print obs_values[1].firstChild.nodeValue

# prints all base:OBS_VALUE in the XML document
for obs_val in obs_values:
    print obs_val.firstChild.nodeValue

However, if you want to use lxml, use underrun's solution. Also, your original code had some errors. You were actually attempting to parse the document variable, which was the web address. You needed to parse the xml returned from the website, which in your example is the get_web variable.

edited Apr 24, 2018 at 23:10

Jean-François Bourdon

2524 silver badges13 bronze badges

answered May 8, 2013 at 13:52

b10hazard

7,85911 gold badges44 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Trying_hard Over a year ago

The added info is appreciated

Moulde Over a year ago

Why did you change url_str to xml_str? Should be: xml_str = urllib.urlopen(url_str).read()

underrun · Accepted Answer · 2013-05-08 13:46:10Z

Take a look at your code:

document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

I'm not sure you have document correct unless you want http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr because that's what you'll get (the parens group in this case and strings listed next to each other automatically concatenate).

After that you do some work to create get_web but then you don't use it in the next line. Instead you try to parse your document which is the url...

Beyond that, I would totally suggest you use ElementTree, preferably lxml's ElementTree (http://lxml.de/). Also, lxml's etree parser takes a file-like object which can be a urllib object. If you did, after straightening out the rest of your doc, you could do this:

from lxml import etree
from io import StringIO
import urllib

url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
root = etree.parse(urllib.urlopen(url))

for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'):
    price = obs.xpath('./base:OBS_VALUE').text
    print(price)

Collectives™ on Stack Overflow

Python XML parsing from website

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related