I'm trying to consume an XML API. I'd like to have some Python objects that represent the XML data. I have several XSD and some example API responses from the documentation.
- http://www.isan.org/schema/v1.11/common/common.xsd
- http://www.isan.org/schema/v1.21/common/serial.xsd
- http://www.isan.org/schema/v1.11/common/version.xsd
- http://www.isan.org/ISAN/isan.xsd
- http://www.isan.org/schema/v1.11/common/title.xsd
- http://www.isan.org/schema/v1.11/common/externalid.xsd
- http://www.isan.org/schema/v1.11/common/participant.xsd
- http://www.isan.org/schema/v1.11/common/language.xsd
- http://www.isan.org/schema/v1.11/common/country.xsd
Here's one example XML response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<serial:serialHeaderType xmlns:isan="http://www.isan.org/ISAN/isan"
xmlns:title="http://www.isan.org/schema/v1.11/common/title"
xmlns:serial="http://www.isan.org/schema/v1.21/common/serial"
xmlns:externalid="http://www.isan.org/schema/v1.11/common/externalid"
xmlns:common="http://www.isan.org/schema/v1.11/common/common"
xmlns:participant="http://www.isan.org/schema/v1.11/common/participant"
xmlns:language="http://www.isan.org/schema/v1.11/common/language"
xmlns:country="http://www.isan.org/schema/v1.11/common/country">
<common:status>
<common:DataType>SERIAL_HEADER_TYPE</common:DataType>
<common:ISAN root="0000-0002-3B9F"/>
<common:WorkStatus>ACTIVE</common:WorkStatus>
</common:status>
<serial:SerialHeaderId root="0000-0002-3B9F"/>
<serial:MainTitles>
<title:TitleDetail>
<title:Title>Braquo</title:Title>
<title:Language>
<language:LanguageLabel>French</language:LanguageLabel>
<language:LanguageCode>
<language:CodingSystem>ISO639_2</language:CodingSystem>
<language:ISO639_2Code>FRE</language:ISO639_2Code>
</language:LanguageCode>
</title:Language>
<title:TitleKind>ORIGINAL</title:TitleKind>
</title:TitleDetail>
</serial:MainTitles>
<serial:TotalEpisodes>11</serial:TotalEpisodes>
<serial:TotalSeasons>0</serial:TotalSeasons>
<serial:MinDuration>
<common:TimeUnit>MIN</common:TimeUnit>
<common:TimeValue>45</common:TimeValue>
</serial:MinDuration>
<serial:MaxDuration>
<common:TimeUnit>MIN</common:TimeUnit>
<common:TimeValue>144</common:TimeValue>
</serial:MaxDuration>
<serial:MinYear>2009</serial:MinYear>
<serial:MaxYear>2009</serial:MaxYear>
<serial:MainParticipantList>
<participant:Participant>
<participant:FirstName>Frédéric</participant:FirstName>
<participant:LastName>Schoendoerffer</participant:LastName>
<participant:RoleCode>DIR</participant:RoleCode>
</participant:Participant>
<participant:Participant>
<participant:FirstName>Karole</participant:FirstName>
<participant:LastName>Rocher</participant:LastName>
<participant:RoleCode>ACT</participant:RoleCode>
</participant:Participant>
</serial:MainParticipantList>
<serial:CompanyList>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>R.T.B.F.</common:CompanyName>
</common:Company>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>Capa Drama</common:CompanyName>
</common:Company>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>Marathon</common:CompanyName>
</common:Company>
</serial:CompanyList>
</serial:serialHeaderType>
I tried simply ignoring the XSD and using lxml.objectify on the XML I'd get from the API. I had a problem with namespaces. Having to refer to every child node with its explicit namespace was a real pain and doesn't make for readable code.
from lxml import objectify
obj = objectify.fromstring(response)
print obj.MainTitles.TitleDetail
# This will fail to find the element because you need to specify the namespace
print obj.MainTitles['{http://www.isan.org/schema/v1.11/common/title}TitleDetail']
# Or something like that, I couldn't get it to work, and I'd much rather use attributes and not specify the namespace
So then I tried generateDS to create some Python class definitions for me. I've lost the error messages that this attempt gave me but I couldn't get it to work. It would generate a module for each XSD that I gave it but it wouldn't parse the example XML.
I'm now trying pyxb and this seems much nicer so far. It's generating nicer definitions than generateDS (splitting them into multiple, reusable modules) but it won't parse the XML:
from models import serial
obj = serial.CreateFromDocument(response)
Traceback (most recent call last):
...
File "/vagrant/isan/isan.py", line 58, in lookup
return serial.CreateFromDocument(resp.content)
File "/vagrant/isan/models/serial.py", line 69, in CreateFromDocument
instance = handler.rootObject()
File "/home/vagrant/venv/lib/python2.7/site-packages/pyxb/binding/saxer.py", line 285, in rootObject
raise pyxb.UnrecognizedDOMRootNodeError(self.__rootObject)
UnrecognizedDOMRootNodeError: <pyxb.utils.saxdom.Element object at 0x2b53664dc850>
The unrecognised node is the <serial:serialHeaderType> node from the example. Looking at the pyxb source it seems that this error comes about "if the top-level element got processed as a DOM instance" but I don't know what this means or how to prevent it.
I've run out of steam for trying to explore this, I don't know what to do next.