0

I'm trying to get information from a specific field from a XML file from a URL. I'm getting these weird erros before I even start to try. Here is my code:

url1 = 'http://www.dac.unicamp.br/sistemas/horarios/grad/G5A0/indiceP.htm'
data1 = urllib.urlopen(url1)
xml1 = minidom.parse(data1)

I get this error:

File "C:\Users\Administrator\Desktop\teste.py", line 15, in <module>
    xml1 = minidom.parse(data1)
  File "C:\Python27\lib\xml\dom\minidom.py", line 1920, in parse
    return expatbuilder.parse(file)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
ExpatError: not well-formed (invalid token): line 4, column 22

Did I do anything wrong? I copied those functions from a tutorial, and it seems like it should be working..

3
  • 2
    seems like the page is not xhtml valid, try using beautifulsoup. Commented Oct 18, 2012 at 15:38
  • @luke14free Oh, is that a thing? So if the page is not valid for XML parsing, is there another way I can get the information I want? If you enter the page you can see in the top right corner, "Verão/2012 ", that's the field I'm looking for. Commented Oct 18, 2012 at 15:41
  • Try this out: validator.w3.org Just paste the url in the address input field Commented Oct 18, 2012 at 15:48

1 Answer 1

1

use lxml.html, it handles invalid xhtml better.

import lxml.html as lh
In [24]: xml1=lh.parse('http://www.dac.unicamp.br/sistemas/horarios/grad/G5A0/indiceP.htm')
Sign up to request clarification or add additional context in comments.

3 Comments

@root I heard good things about BeautifulSoup. How do they compare?
some say BS handles badly formed source better, but from personal experience lxml.html does as well. for well formed source i would say, lxml is superior. lxml is also much faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.