3

What's the best way to go about writing a python module that can validate HTML, especially with embedded RDFa? I'm familiar with validator.w3.org, and I'm interested in writing a custom validator that performs a similar function, but for a different standard that utilizes RDFa for element metadata. What are some good pieces of source code to look at, Python libraries to try out, things to keep in mind?

1 Answer 1

2

Emmett,

I am not sure what you want to achieve. I did write an RDFa distiller in Python. The first question you have to ask is whether you want to consider XHTML or HTML5. If the former, there are a bunch of XML environments around, and also DTD-s for RDFa usage. That could work. For HTML5, you may want to use the HTML5 parser in Python but that does not 'know' about RDFa, but it can produce, say, a DOM tree (or other representations) that you can then use to check the RDFa attributes. Note, however, that the HTML5 parser does not perform 'validation' in the sense of anaylizing the HTML5 code for various possible error conditions; it just produces a, say, DOM tree according to the HTML5 spec.

I hope this helps.

Cheers

Ivan

Sign up to request clarification or add additional context in comments.

2 Comments

For reference, the code for the RDFa distiller is on github: github.com/RDFLib/pyrdfa3
thanks very much Ivan, I've used html5lib before and I find its lack of documentation a bit annoying, but it still might be exactly what i need. the RDFa distiller looks great too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.