Conditional XML parsing in Python

Question

I would like to select the information of all child elements in very large xml file if its parent has certain information. If, as in the sample code, the attribute of the node sn contains elliptic="yes", then select the v node and retrieve attribute values (e.g. wd="vulgui").

 <sentence>
<sadv arg="argM" func="cc" tem="tmp">
  <sadv>
    <grup.adv>
      <r lem="després" pos="rg" wd="Després"/>
      <sp>
        <prep>
          <s lem="de" pos="sps00" postype="preposition" wd="de"/>
        </prep>
        <sn entityref="nne">
          <spec gen="m" num="p">
            <z lem="15" ne="number" wd="15"/>
          </spec>
          <grup.nom gen="m" num="p">
            <n gen="m" lem="any" num="p" pos="ncmp000" postype="common" sense="16:10917509" wd="anys"/>
            <sp>
              <prep>
                <s lem="de" pos="sps00" postype="preposition" wd="de"/>
              </prep>
              <sn entityref="nne">
                <spec gen="f" num="s">
                  <d coreftype="ident" entity="entity3" entityref="nne" gen="f" lem="el_seu" num="s" person="3" pos="dp3fs0" postype="possessive" wd="la_seva"/>
                </spec>
                <grup.nom gen="f" num="s">
                  <n gen="f" lem="creació" num="s" pos="ncfs000" postype="common" sense="16:00583085" wd="creació"/>
                </grup.nom>
              </sn>
            </sp>
          </grup.nom>
        </sn>
      </sp>
    </grup.adv>
  </sadv>
  <f lem="," pos="fc" punct="comma" wd=","/>
</sadv>
<sn arg="arg0" coreftype="ident" **elliptic="yes"** entity="entity3" entityref="nne" func="suj" tem="agt"/>
<grup.verb>
  <v lem="presentar" lss="A32.ditransitive-patient-benefactive" mood="indicative" num="p" person="3" pos="vmip3p0" postype="main" tense="present" **wd="presenten"**/>
</grup.verb>
<sn arg="arg1" entityref="spec" func="cd" tem="pat">
  <spec gen="m" num="s">
    <d gen="m" lem="un" num="s" pos="di0ms0" postype="indefinite" wd="un"/>
  </spec>
  <grup.nom gen="m" num="s">
    <s.a gen="m" num="s">
      <grup.a gen="m" num="s">
        <a gen="m" lem="nou" num="s" pos="aq0ms0" postype="qualificative" wd="nou"/>
      </grup.a>
    </s.a>
    <n gen="m" lem="disc" num="s" pos="ncms000" postype="common" sense="16:03112307" wd="disc"/>
    <sn entityref="ne" ne="other">
      <f lem="," pos="fc" punct="comma" wd=","/>
      <grup.nom>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
        <n lem="Electroretard" ne="other" pos="np0000a" postype="proper" sense="16:cs1" wd="Electroretard"/>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
      </grup.nom>
    </sn>
  </grup.nom>
</sn>
<f lem="." pos="fp" punct="period" wd="."/>

I couldn't come up with a solution after:

for sn in root.iter('sn'):
rank = sn.get('elliptic')
if rank == 'yes':

How could I continue this line of code? I thought something like:

"iterate through all children whose parents contain @elliptic="yes"

simkusr · Accepted Answer · 2018-04-12 11:00:18Z

1

Well as I understand the simplest way is to build xpath and put it in try ->if/except block:

xpath = '(//sn[@elliptic="yes"])[1]'

Now create a if statement that would check if this element is in you xml group and if it exists, then do what you need. E.g. if this true, then use another xpath's or etc to extract what is needed.

p.s. this [1] means that you are searching for 1st element in xml, if there is more then 1 then without it, it can break. So create iterator i that would go in your xpath (//sn[@elliptic="yes"])[i]

answered Apr 12, 2018 at 11:00

simkusr

8102 gold badges12 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Paul Kremershof Over a year ago

Thank very mucho Rolandas. The problem is that I need to find all the children of the sn parent nodes if the condition (elliptic = yes) is true. I should have noted that the example above is just an excerpt from a very large file.

simkusr Over a year ago

Ok, what are you using, BeautifoulSoup, scrapy, or just you read file that has this xml, or how? could you simulate example of full structure? It would be easyer (a larger example). :)

Paul Kremershof Over a year ago

I'm just reading the xml file via element tree. I'll put a larger example in the main question field;-)

simkusr Over a year ago

Check my answer here stackoverflow.com/questions/7019350/…, you can do that using bs4 module to parse. Where url is given, you can change it with str(yourXML) and that will work. Then with 'BeautifulSoup(yourXML, 'lxml').find_all('tag', {'elliptic': 'yes'}).descendants(thisWillFindAllchildrensAndChildrenChildrens)'

simkusr Over a year ago

Don't forget, when you are using find_all it will find everything, so you will need to use for loop to get each item to do with them something. If you'r file is very big, try limit your query. .find_all('tag', {'elliptic': 'yes'}, limit=10), this limit=10 will limit your result to 10, and will stop searching for items in given your xml.

|

Collectives™ on Stack Overflow

Conditional XML parsing in Python

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related