88

Good day! Need to convert xml using xslt in Python. I have a sample code in php.

How to implement this in Python or where to find something similar? Thank you!

$xmlFileName = dirname(__FILE__)."example.fb2";
$xml = new DOMDocument();
$xml->load($xmlFileName);

$xslFileName = dirname(__FILE__)."example.xsl";
$xsl = new DOMDocument;
$xsl->load($xslFileName);

// Configure the transformer
$proc = new XSLTProcessor();
$proc->importStyleSheet($xsl); // attach the xsl rules
echo $proc->transformToXML($xml);

3 Answers 3

143

Using lxml,

import lxml.etree as ET

dom = ET.parse(xml_filename)
xslt = ET.parse(xsl_filename)
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))
Sign up to request clarification or add additional context in comments.

7 Comments

Hello, I have one more question here. If the xml file is large, the efficiency of "newdom = transform(dom)" is very bad. I tried to parse a large xml file (>100MB), it will cost a long time (>4 hours) to transform. Do you have any good advice about transforming xml file with xslt using python lxml?
Under the hood, lxml is using libxslt to transform the XML. I don't know of anything you can do to speed this up. Maybe if you post your XSLT in a new question, someone will be able to suggest an improvement.
Thanks. I have tried using write with no luck. How do you write contents to file?
@programiss What you mean with no luck? Etree.write(file, pretty_print) is the method to use. Maybe you passed wrong argument? That's common mistake. It's supposed to be writable object (eg. file or sys.stdout, etc), so not just filename string.
With Python 3, one has to produce a unicode string when writing to stdout: print(ET.tostring(newdom, pretty_print=True, encoding="unicode"))
|
6

LXML is a widely used high performance library for XML processing in python based on libxml2 and libxslt - it includes facilities for XSLT as well.

Comments

4

The best way is to do it using lxml, but it only support XSLT 1

import os
import lxml.etree as ET

inputpath = "D:\\temp\\"
xsltfile = "D:\\temp\\test.xsl"
outpath = "D:\\output"


for dirpath, dirnames, filenames in os.walk(inputpath):
            for filename in filenames:
                if filename.endswith(('.xml', '.txt')):
                    dom = ET.parse(inputpath + filename)
                    xslt = ET.parse(xsltfile)
                    transform = ET.XSLT(xslt)
                    newdom = transform(dom)
                    infile = unicode((ET.tostring(newdom, pretty_print=True)))
                    outfile = open(outpath + "\\" + filename, 'a')
                    outfile.write(infile)

to use XSLT 2 you can check options from Use saxon with python

2 Comments

for people using XSLT3.x this is the way!
Long story short: for XSLT 2 and up, use SaxonC. As @Maliq suggests, lxml does not support any higher than XSLT 1. In my case, the line transform = ET.XSLT(xslt) errors because my xsltfile has <xsl:stylesheet version="2.0" in it. See here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.