I'm attempting to use the XRDTools library to convert Panalytical XRDML files into a more database-friendly format, such as a pandas dataframe.
The XRDTools library is described here: https://github.com/paruch-group/xrdtools. It imports the XRDML file into a Python dictionary. I'm totally new to LXML, so I apologize if this is a simple question.
I've used Anaconda to create Python 2.7 and 3.6 environments specifically to work with the XRDTools package. I'd like to run it in Python 3.6.
In Python 2.7, this code runs smoothly:
import xrdtools
xrd = xrdtools.read_xrdml('filename.xrdml')
Output is a dict:
{u'2Theta': array([63. , 63.00334225, 63.00668449, ..., 67.99331551,
67.99665775, 68. ]),
u'Lambda': 1.540598,
u'Omega': array([31. , 31.00200535, 31.0040107 , ..., 33.9959893 ,
33.99799465, 34. ]), ...
I can then use the dictionary like any other Python object.
In Python 3.6, that same code generates this error message:
Traceback (most recent call last):
File "...\AppData\Local\Continuum\Anaconda2\envs\py36xrd\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-b6f5409b8bf9>", line 1, in <module>
xrd = xrdtools.read_xrdml('filename.xrdml')
File "...\XRDTools\xrdtools\xrdtools\io.py", line 297, in read_xrdml
valid = validate_xrdml_schema(filename)
File ...\XRDTools\xrdtools\xrdtools\io.py", line 43, in validate_xrdml_schema
xmlschema_doc = etree.parse(f)
File "src\lxml\etree.pyx", line 3444, in lxml.etree.parse (src\lxml\etree.c:83171)
File "src\lxml\parser.pxi", line 1855, in lxml.etree._parseDocument (src\lxml\etree.c:121011)
File "src\lxml\parser.pxi", line 1875, in lxml.etree._parseFilelikeDocument (src\lxml\etree.c:121294)
File "src\lxml\parser.pxi", line 1770, in lxml.etree._parseDocFromFilelike (src\lxml\etree.c:120078)
File "src\lxml\parser.pxi", line 1185, in lxml.etree._BaseParser._parseDocFromFilelike (src\lxml\etree.c:114806)
File "src\lxml\parser.pxi", line 598, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\etree.c:107724)
File "src\lxml\parser.pxi", line 709, in lxml.etree._handleParseResult (src\lxml\etree.c:109433)
File "src\lxml\parser.pxi", line 638, in lxml.etree._raiseParseError (src\lxml\etree.c:108287)
File "...\XRDTools\xrdtools\xrdtools\data\schemas\XRDMeasurement15.xsd", line 1
<?xml version="1.0" encoding="UTF-8"?>
^
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
Digging into io.py, there is this function:
def validate_xrdml_schema(filename):
"""Validate the xml schema of a given file.
Parameters
----------
filename : str
The Filename of the `.xrdml` file to test.
Returns
-------
float or None
Returns the version number as float or None if
the file was not matching any provided xml schema.
"""
schemas = [(1.5, 'data/schemas/XRDMeasurement15.xsd'),
(1.4, 'data/schemas/XRDMeasurement14.xsd'),
(1.3, 'data/schemas/XRDMeasurement13.xsd'),
(1.2, 'data/schemas/XRDMeasurement12.xsd'),
(1.1, 'data/schemas/XRDMeasurement11.xsd'),
(1.0, 'data/schemas/XRDMeasurement10.xsd'),
]
schemas = [(v, os.path.join(package_path, schema)) for v, schema in schemas]
with open(filename, 'r') as f:
data_xml = etree.parse(f)
for version, schema in schemas:
with open(schema, 'r') as f:
xmlschema_doc = etree.parse(f)
xmlschema = etree.XMLSchema(xmlschema_doc)
valid = xmlschema.validate(data_xml)
if valid:
return version
return None
From what I've read, xmlschema_doc = etree.parse(f) is causing the issues. If I change that line to etree.parse(filename), it'll run without an error, but I'm not sure if that matters at all. I also haven't been able to apply that fix to anything other than a small self-contained cell in a Jupyter notebook.
What causes the error? Is there a way to fix it for Python 3? What's the best way to implement that fix?
Would love to get this resolved. TIA!
Most related problem I could find: Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
bytesorstr?as f) and pass that to the parser. Is that what you mean?