1

Here is a part of an XML file that describes forms:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfHouse>
<XmlForm>
<houseNum>1</houseNum>
 <plan1> 
  <coord>
    <X> 1.2  </X>
    <Y> 2.1  </Y>
    <Z> 3.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
  </color>
 </plan1>
 <plan2>
  <coord>  
    <X> 21.2  </X>
    <Y> 22.1  </Y>
    <Z> 31.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
</color>
 </plan2> 
</XmlForm>


<XmlForm>
<houseNum>2</houseNum>
 <plan1> 
  <coord>
    <X> 11.2  </X>
    <Y> 12.1  </Y>
    <Z> 13.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 255   </G>
    <B> 0   </B>
  </color>
 </plan1>
 <plan2>
  <coord>  
    <X> 211.2  </X>
    <Y> 212.1  </Y>
    <Z> 311.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 255   </B>
</color>
 </plan2> 
</XmlForm>
</ArrayOfHouse>

Here is my code to recuperate the coordinates of each plan for the house 1 and 2, the problem is in the this line coord=tree.findall("XmlForm/[houseNum=str(houseindex)], the same problem is raised when using houseindex.__str__()

import pandas as pd
import numpy as np
from lxml import etree
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
tree =etree.parse("myexample.xml")
#recuperate the columns name for pandas dataframe
planlist=tree.findall("XmlForm/[houseNum='1']/")

columns=[]

for el in planlist[1:]:
    columns.append(el.tag)

#Declare pandas dataFrame
df=pd.DataFrame(columns=list('XYZ'),dtype=float)
for houseindex in range(0,2):
    for index in range(len(columns)):

        coord=tree.findall("XmlForm/[houseNum=str(houseindex)]/"+columns[index]+"/coord/")
        XYZ=[]
        for cc in coord:
            XYZ.append(cc.text)
        df.loc[index]=XYZ
print(df)
0

2 Answers 2

2

You clearly want str(houseindex) to be interpreted in Python before constructing your XPath expression. (Your error message is telling you that str() isn't an XPath function.)

Therefore, change the argument of coord=tree.findall() from

"XmlForm/[houseNum=str(houseindex)]/"+columns[index]+"/coord/"

to

"XmlForm/[houseNum="+str(houseindex)+"]/"+columns[index]+"/coord/"

Two more fixes to that XPath:

  1. Remove the / before the predicate on XmlForm.
  2. Add quotes around the equality test of houseNum.

Final XPath with no further syntax errors

The following XPath has all three fixes combined and has no further syntax errors:

"XmlForm[houseNum='"+str(houseindex)+"']/"+columns[index]+"/coord/"
Sign up to request clarification or add additional context in comments.

2 Comments

I get the same error message in the same line. line 64, in <module> coord=tree.findall("XmlForm/[houseNum="+str(houseindex)+"]/"‌​‌​+columns[index]+"/‌​co‌​ord/") ...raise SyntaxError("invalid predicate") SyntaxError: invalid predicate –
Sorry, there are two additional XPath syntax errors to fix. Answer updated.
1

You don't inject "houseindex" into your string. Also be careful within your for loop of houseindex as you currently use range(0, 2) which corresponds to 0 and 1. Based on your xml example you rather want to use range(1, 3).

I believe you want to have something like this (I slightly refactored your code to improve readability):

import pandas as pd
from lxml import etree

tree = etree.parse("myexample.xml")

# recuperate the columns name for pandas dataframe
plan_list = tree.findall("XmlForm/[houseNum='1']/")
columns = [el.tag for el in plan_list[1:]]

# Declare pandas dataFrame
data = list()
for house_index in range(1, 3):
    for column in columns:

        element_text = "XmlForm/[houseNum='{index}']/{column}/coord/".format(index=house_index, column=column)
        coord = tree.findall(element_text)
        row = [cc.text for cc in coord]
        data.append(row)

df = pd.DataFrame(data, columns=list('XYZ'), dtype=float)
print(df)

1 Comment

Ideally, though, avoid constructing XPath expressions by string concatenation. (a) there's a risk of code injection attacks, and (b) you're compiling the XPath expression each time it is executed. If the Python API permits it, it's better to compile an XPath expression containing variable references and then bind values to the variable when the expression is executed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.