0

I would like to select the information of all child elements in very large xml file if its parent has certain information. If, as in the sample code, the attribute of the node sn contains elliptic="yes", then select the v node and retrieve attribute values (e.g. wd="vulgui").

 <sentence>
<sadv arg="argM" func="cc" tem="tmp">
  <sadv>
    <grup.adv>
      <r lem="després" pos="rg" wd="Després"/>
      <sp>
        <prep>
          <s lem="de" pos="sps00" postype="preposition" wd="de"/>
        </prep>
        <sn entityref="nne">
          <spec gen="m" num="p">
            <z lem="15" ne="number" wd="15"/>
          </spec>
          <grup.nom gen="m" num="p">
            <n gen="m" lem="any" num="p" pos="ncmp000" postype="common" sense="16:10917509" wd="anys"/>
            <sp>
              <prep>
                <s lem="de" pos="sps00" postype="preposition" wd="de"/>
              </prep>
              <sn entityref="nne">
                <spec gen="f" num="s">
                  <d coreftype="ident" entity="entity3" entityref="nne" gen="f" lem="el_seu" num="s" person="3" pos="dp3fs0" postype="possessive" wd="la_seva"/>
                </spec>
                <grup.nom gen="f" num="s">
                  <n gen="f" lem="creació" num="s" pos="ncfs000" postype="common" sense="16:00583085" wd="creació"/>
                </grup.nom>
              </sn>
            </sp>
          </grup.nom>
        </sn>
      </sp>
    </grup.adv>
  </sadv>
  <f lem="," pos="fc" punct="comma" wd=","/>
</sadv>
<sn arg="arg0" coreftype="ident" **elliptic="yes"** entity="entity3" entityref="nne" func="suj" tem="agt"/>
<grup.verb>
  <v lem="presentar" lss="A32.ditransitive-patient-benefactive" mood="indicative" num="p" person="3" pos="vmip3p0" postype="main" tense="present" **wd="presenten"**/>
</grup.verb>
<sn arg="arg1" entityref="spec" func="cd" tem="pat">
  <spec gen="m" num="s">
    <d gen="m" lem="un" num="s" pos="di0ms0" postype="indefinite" wd="un"/>
  </spec>
  <grup.nom gen="m" num="s">
    <s.a gen="m" num="s">
      <grup.a gen="m" num="s">
        <a gen="m" lem="nou" num="s" pos="aq0ms0" postype="qualificative" wd="nou"/>
      </grup.a>
    </s.a>
    <n gen="m" lem="disc" num="s" pos="ncms000" postype="common" sense="16:03112307" wd="disc"/>
    <sn entityref="ne" ne="other">
      <f lem="," pos="fc" punct="comma" wd=","/>
      <grup.nom>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
        <n lem="Electroretard" ne="other" pos="np0000a" postype="proper" sense="16:cs1" wd="Electroretard"/>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
      </grup.nom>
    </sn>
  </grup.nom>
</sn>
<f lem="." pos="fp" punct="period" wd="."/>

I couldn't come up with a solution after:

for sn in root.iter('sn'):
rank = sn.get('elliptic')
if rank == 'yes':

How could I continue this line of code? I thought something like:

"iterate through all children whose parents contain @elliptic="yes"

1 Answer 1

1

Well as I understand the simplest way is to build xpath and put it in try ->if/except block:

xpath = '(//sn[@elliptic="yes"])[1]'

Now create a if statement that would check if this element is in you xml group and if it exists, then do what you need. E.g. if this true, then use another xpath's or etc to extract what is needed.

p.s. this [1] means that you are searching for 1st element in xml, if there is more then 1 then without it, it can break. So create iterator i that would go in your xpath (//sn[@elliptic="yes"])[i]

Sign up to request clarification or add additional context in comments.

7 Comments

Thank very mucho Rolandas. The problem is that I need to find all the children of the sn parent nodes if the condition (elliptic = yes) is true. I should have noted that the example above is just an excerpt from a very large file.
Ok, what are you using, BeautifoulSoup, scrapy, or just you read file that has this xml, or how? could you simulate example of full structure? It would be easyer (a larger example). :)
I'm just reading the xml file via element tree. I'll put a larger example in the main question field;-)
Check my answer here stackoverflow.com/questions/7019350/…, you can do that using bs4 module to parse. Where url is given, you can change it with str(yourXML) and that will work. Then with 'BeautifulSoup(yourXML, 'lxml').find_all('tag', {'elliptic': 'yes'}).descendants(thisWillFindAllchildrensAndChildrenChildrens)'
Don't forget, when you are using find_all it will find everything, so you will need to use for loop to get each item to do with them something. If you'r file is very big, try limit your query. .find_all('tag', {'elliptic': 'yes'}, limit=10), this limit=10 will limit your result to 10, and will stop searching for items in given your xml.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.