1

Here is a part of my xml file..

- <a:p>
    - <a:pPr lvl="2">
        - <a:spcBef>
              <a:spcPts val="200" /> 
          </a:spcBef>
     </a:pPr>
    - <a:r>
          <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" /> 
          <a:t>The</a:t> 
     </a:r>
    - <a:r>
         <a:rPr lang="en-US" sz="1400" dirty="0" /> 
         <a:t>world</a:t> 
      </a:r>
     - <a:r>
          <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" /> 
          <a:t>is small</a:t> 
      </a:r>
  </a:p>
    - <a:p>
    - <a:pPr lvl="2">
        - <a:spcBef>
              <a:spcPts val="200" /> 
          </a:spcBef>
     </a:pPr>
    - <a:r>
          <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" b="0" /> 
          <a:t>The</a:t> 
     </a:r>
    - <a:r>
         <a:rPr lang="en-US" sz="1400" dirty="0" b="0" /> 
         <a:t>world</a:t> 
      </a:r>
     - <a:r>
          <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" b="0" /> 
          <a:t>is too big</a:t> 
      </a:r>
  </a:p>

I have written a code using lxml to extract the text. But, as the sentence is split into two lines, I want to join these two to form a single sentence like The world is small... . So here I write a code:

path4 = file.xpath('/p:sld/p:cSld/p:spTree/p:sp/p:txBody/a:p/a:r/a:rPr', namespaces={'p':'http://schemas.openxmlformats.org/presentationml/2006/main',
                'a':'http://schemas.openxmlformats.org/drawingml/2006/main'})
    if path5:
        for a in path4:  
            if a.get('sz') == '1400' and a.xpath('node()') == [] and a.get('b') != '0':
                b = a.getparent()
                c = b.getparent()
                d = c.xpath('./a:r/a:t/text()' , namespaces {'p':'http://schemas.openxmlformats.org/presentationml/2006/main', 'a':'http://schemas.openxmlformats.org/drawingml/2006/main'})
                print ''.join(d)
             elif a.get('sz') == '1400' and a.xpath('node()') == [] and a.get('b') == '0':
                b = a.getparent()
                c = b.getparent()
                d = c.xpath('./a:r/a:t/text()' , namespaces {'p':'http://schemas.openxmlformats.org/presentationml/2006/main', 'a':'http://schemas.openxmlformats.org/drawingml/2006/main'})
                print ''.join(d)

I get the output :

The world is samll...
The world is small...
The world is small...

expected output:

the world is small...

any suggestions?

1 Answer 1

1

You are making the sentence for every a:rPr found in the loop.

Here's an example of what you should do instead:

test.xml:

<body xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
      xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
    <a:p>
        -
        <a:pPr lvl="2">
            -
            <a:spcBef>
                <a:spcPts val="200"/>
            </a:spcBef>
        </a:pPr>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0"/>
            <a:t>The</a:t>
        </a:r>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0"/>
            <a:t>world</a:t>
        </a:r>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0"/>
            <a:t>is small</a:t>
        </a:r>
    </a:p>
    <a:p>
        -
        <a:pPr lvl="2">
            -
            <a:spcBef>
                <a:spcPts val="200"/>
            </a:spcBef>
        </a:pPr>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" b="0"/>
            <a:t>The</a:t>
        </a:r>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0" b="0"/>
            <a:t>world</a:t>
        </a:r>
        -
        <a:r>
            <a:rPr lang="en-US" sz="1400" dirty="0" smtClean="0" b="0"/>
            <a:t>is too big</a:t>
        </a:r>
    </a:p>
</body>

test.py:

from lxml import etree


tree = etree.parse('test.xml')
NAMESPACES = {'p': 'http://schemas.openxmlformats.org/presentationml/2006/main',
              'a': 'http://schemas.openxmlformats.org/drawingml/2006/main'}

path = tree.xpath('/body/a:p', namespaces=NAMESPACES)

for outer_item in path:
    parts = []
    for item in outer_item.xpath('./a:r/a:rPr', namespaces=NAMESPACES):
        parts.append(item.getparent().xpath('./a:t/text()', namespaces=NAMESPACES)[0])

    print " ".join(parts)

output:

The world is small

The world is too big

So, just looping over a:p items and extracting the text into parts, then print it after processing of each a:p. I've removed if statement for clarity.

Hope that helps.

Sign up to request clarification or add additional context in comments.

8 Comments

Many Thanks for your effort to form the answer.This I have tried before..This works but not when there are two to three elif condition after the if conditionbecause the list parts should be printed outside the loop..
You're welcome. Well, you can keep track of several conditions by making parts a list of lists or a dictionary. It's hard to say without a real example. Could you please improve your question to see what you are talking about?
well i will post the whole example on pastecode.org and will give the link here for your reference is that k?
Well, you can try to extend your example to contain one more elif and more xml. If it's not possible, let's go with pastecode.org, thanks.
oops excuse me i edited your answer by mistake ..I am so sorry for that!!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.