0

I have this xml that i want to parse with python xml.etree.ElementTree

<draw:page draw:name="page3" draw:style-name="dp3" draw:master-page-name="Blue_5f_Curve1_5f_" presentation:presentation-page-layout-name="AL2T1" presentation:use-date-time-name="dtd1">
                <office:forms form:automatic-focus="false" form:apply-design-mode="false"/>
                <draw:frame presentation:style-name="pr4" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="10cm" svg:x="1cm" svg:y="3cm" presentation:class="outline" presentation:user-transformed="true">
                    <draw:text-box>
                        <text:list text:style-name="L2">
                            <text:list-header>
                                <text:p>
                                    <text:span text:style-name="T2">Sources</text:span>
                                </text:p>
                                <text:list>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Medium.com</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Livres pdf </text:span>
                                        </text:p>
                                        <text:list>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">
                                                        Docker - 
                                                        <text:s/>
                                                        Concepts fondamentaux et déploiement d'applications distribuées edition ENI
                                                    </text:span>
                                                </text:p>
                                            </text:list-item>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">Docker - Prise en main et mise en pratique sur une architecture micro-services (JP. Gouigoux ENI)</text:span>
                                                </text:p>
                                            </text:list-item>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">Docker – Pratique des architectures à base de conteneurs Edition Dunod</text:span>
                                                </text:p>
                                            </text:list-item>
                                        </text:list>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Youtube</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Stackoverflow</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">playwithdocker</text:span>
                                        </text:p>
                                    </text:list-item>
                                </text:list>
                            </text:list-header>
                        </text:list>
                    </draw:text-box>
                </draw:frame>
                <draw:frame presentation:style-name="pr3" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="1.328cm" svg:x="1cm" svg:y="0.5cm" presentation:class="title" presentation:user-transformed="true">
                    <draw:text-box>
                        <text:p text:style-name="P4">
                            <text:span text:style-name="T4">Docker</text:span>
                        </text:p>
                    </draw:text-box>
                </draw:frame>
                <presentation:notes draw:style-name="dp2">
                    <draw:page-thumbnail draw:style-name="gr1" draw:layer="layout" svg:width="14.848cm" svg:height="11.136cm" svg:x="3.075cm" svg:y="2.257cm" draw:page-number="3" presentation:class="page"/>
                    <draw:frame presentation:style-name="pr5" draw:text-style-name="P5" draw:layer="layout" svg:width="16.799cm" svg:height="13.364cm" svg:x="2.1cm" svg:y="14.107cm" presentation:class="notes" presentation:placeholder="true" presentation:user-transformed="true">
                        <draw:text-box/>
                    </draw:frame>
                </presentation:notes>
            </draw:page>
            

Ultimately, i want to get the value of all text:p elements or their child if they exist such as <text:span ...>.

My python code is :

   ostr = self.m_odf.read('content.xml')
    doc = ET.fromstring(ostr)
    self.pages = doc.findall("//*[@name='draw:page']")#'text:p')

I want to first get a list of the draw:page nodes and then search inside these nodes the 'text:p' elements.

My code returns an error "SyntaxError: cannot use absolute path on element".

Im not used to these tag:x syntax in xml so i dont find how to parse it with xpath ("//*[@name='draw:page']") doesnt seem to work.

Could you help me please?

4
  • I would recommend reproducing your error with a barebones version of your input xml. It could also help others lend a hand. Commented Sep 22, 2022 at 13:38
  • the xml is 87 pages long. this is only one page.... Commented Sep 22, 2022 at 13:45
  • Can you reproduce your error with a smaller version of your xml? Commented Sep 22, 2022 at 13:50
  • you can test here : freeformatter.com/xpath-tester.html with the full file (all the styles need to be loaded...) : s3.eu-west-3.amazonaws.com/pretty.xml/pretty.xml Commented Sep 22, 2022 at 14:17

1 Answer 1

0

In your xml there are a lot if missing namespace declarations, but surely they are on a high level in the xml-tree. If you are not able to use those namespace, you can use the local-name() function, to select elements based on there names without the namespace-prefix.

In the end you could try this XPath:

"//*[local-name()='page']//*[local-name()='p']//node()"

where the last part select all descendants (also text-nodes and element-nodes)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.