6

I am using lxml to make an xml file and my sample program is :

from lxml import etree
import datetime
dt=datetime.datetime(2013,11,30,4,5,6)
dt=dt.strftime('%Y-%m-%d')
page=etree.Element('html')
doc=etree.ElementTree(page)
dateElm=etree.SubElement(page,dt)
outfile=open('somefile.xml','w')
doc.write(outfile)

And I am getting the following error output :

dateElm=etree.SubElement(page,dt)
  File "lxml.etree.pyx", line 2899, in lxml.etree.SubElement (src/lxml/lxml.etree.c:62284)
  File "apihelpers.pxi", line 171, in lxml.etree._makeSubElement (src/lxml/lxml.etree.c:14296)
  File "apihelpers.pxi", line 1523, in lxml.etree._tagValidOrRaise (src/lxml/lxml.etree.c:26852)
ValueError: Invalid tag name u'2013-11-30'

I thought it of a Unicode Error, so tried changing encoding of 'dt' with codes like

  1. str(dt)
  2. unicode(dt).encode('unicode_escape')
  3. dt.encocde('ascii','ignore')
  4. dt.encode('ascii','decode')

and some others also, but none worked and same error msg generated.

4
  • Can you add the relevant fragment of your input XMl? Commented Nov 30, 2013 at 16:30
  • ^ my xml file is empty. I am writing output by using the last line of code - 'doc.write(outfile)' Commented Nov 30, 2013 at 16:33
  • It seems you are writing out the date as a tag. Is that what you meant to do? Commented Nov 30, 2013 at 16:36
  • ^ ah..yes. I am writing like <date>some value</date> Commented Nov 30, 2013 at 16:39

2 Answers 2

10

You get the error because element names are not allowed to begin with a digit in XML. See http://www.w3.org/TR/xml/#sec-common-syn and http://www.w3.org/TR/xml/#sec-starttags. The first character of a name must be a NameStartChar, which disallows digits.

An element such as <2013-11-30>...</2013-11-30> is invalid.

An element such as <D2013-11-30>...</D2013-11-30> is OK.

If your program is changed to use ElementTree instead of lxml (from xml.etree import ElementTree as etree instead of from lxml import etree), there is no error. But I would consider that a bug. lxml does the right thing, ElementTree does not.

Sign up to request clarification or add additional context in comments.

3 Comments

I am changing the date element to a string by using str(date) method before inserting it..and I tried the same with 'xml.etree.ElementTree' and it worked fine. May be there is problem with 'lxml' , but I don't know for sure . Correct me if I am wrong.
Using str(dt) does not help. At that point in the program, dt already is a string (the return value of datetime.strftime()). lxml is correct when it rejects the 2013-11-30 tag name.
@j.f.sebastian - It's definitely an error on 'xml.etree.ElementTree' , because even if it saves the 'date' format to 'xml' file, it shows parsing error if I do "xml.etree.ElementTree.parse('somefile.xml')" for the saved file. Adding an extra character to 'date' tag solves the problem. Thanks for the help!!
1

It is not about Unicode. There is no 2013-11-30 tag in HTML. You could use time tag instead:

#!/usr/bin/env python
from datetime import date
from lxml.html import tostring
from lxml.html.builder import E


datestr = date(2013, 11, 30).strftime('%Y-%m-%d')

page = E.html(
    E.title("date demo"),
    E('time', "some value", datetime=datestr))

with open('somefile.html', 'wb') as file:
    file.write(tostring(page, doctype='<!doctype html>', pretty_print=True))

2 Comments

my file is 'xml' , not 'html'. So , I could store anything as tag, given it is str (Correct me if wrong..). My code is working fine if I import 'xml.etree.ElementTree' , I don't know why it's a problem in 'lxml'
@Chandrakant: 1. don't use the root html tag if the document is not actually HTML. It is misleading. 2. As mzjn said, ElementTree is wrong to accept your input. lxml works as defined in XML specification. As you said, your file is 'xml' so it must follow XML specification.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.