0

I am parsing XML file with python mini-Dom module.while writing data into file its giving error like Unicode Encode Error: 'ASCII' codec can't encode characters in position 0-3: ordinal not in range(128). But Out put prints perfectly on command line Please tell me the solution.

my XML file is:

   <?xml version="1.0"?>
    <Feature>
        <Word Root  ="ਨੌਕਰ-ਚਾਕਰ">
            <info Inflection  ="ਨੌਕਰਾਂ-ਚਾਕਰਾਂ">
        <posinfo gender  ="Masculine" number  ="Plural" case  ="Oblique" />

                </info>
        </Word>
                </Feature>

My python code is:

import sys

from xml.dom import minidom

file=open("npu.txt","w+")
doc = minidom.parse("NPU.xml")
word = doc.getElementsByTagName("Word")
for each in word:
    # print "root"+each.getAttribute("Root")
    file.write(each.getAttribute("Root")+"\n")
    hh=each.getElementsByTagName("info")

    for each1 in hh:
        # print "inflection"+each1.getAttribute("Inflection")
        file.write(each1.getAttribute("Inflection")+"\t")

        vv=each1.getElementsByTagName("posinfo")
        for each2 in vv:
            # print each2.getAttribute("gender")
            # print each2.getAttribute("number")
            # print each2.getAttribute("case")
            file.write( each2.getAttribute("gender")+",")
            file.write( each2.getAttribute("number")+",")
            file.write(each2.getAttribute("case"))
        file.write("\n")
    file.write("--------\n")

2 Answers 2

1
encode data while writing-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file=open("npu.txt","w+") 
file.write("ਨੌਕਰ-ਚਾਕਰ")
Sign up to request clarification or add additional context in comments.

Comments

0

The problem isn't in the way you parse the XML, this is an encoding problem.

The error is caused by the encoding of your text (UTF-8). You are trying to write your text as ASCII that doesn't include the characters that you are using.

try with codecs as follow:

import codecs

file = codecs.open("npu.txt", "w+", "utf-8")
file.write("ਨੌਕਰ-ਚਾਕਰ".decode('utf-8'))
file.close()

EDIT :

You can also set the default encoding to UTF-8 adding the special comment # -*- coding: UTF-8 -*- at the beginning of the python source. The default encoding is ASCII (7-bit). Note that Python identifiers are still restricted to ASCII characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.