0

So i have a need to process some HTML in Python, and my requirement is that i need to find a certain tag and replace it with different charecter based on the content of the charecters...

<html>
   <Head>
   </HEAD>
   <body>
     <blah>
       <_translate attr="french"> I am no one, 
           and no where <_translate>
     <Blah/>
   </body>
 </html>

Should become

<html>
   <Head>
   </HEAD>
   <body>
     <blah>
       Je suis personne et je suis nulle part
     <Blah/>
   </body>
</html>

I would like to leave the original HTML untouched an only replace the tags labeled 'important-tag'. Attributes and the contents of that tag will be important to generate the tags output.

I had though about using extending HTMLParser Object but I am having trouble getting out the orginal HTML when i want it. I think what i most want is to parse the HTML into tokens, with the orginal text in each token so i can output my desired output ... i.e. get somthing like

(tag, "<html>")
(data, "\n    ")
(tag, "<head>")
(data, "\n    ")
(end-tag,"</HEAD>")
ect...
ect...

Anyone know of a good pythonic way to accomplish this ? Python 2.7 standard libs are prefered, third party would also be useful to consider...

Thanks!

2
  • It looks like that issue was covered here: stackoverflow.com/questions/717541/parsing-html-in-python. Commented Nov 11, 2013 at 16:43
  • No it wasnt. I am aware of the DOM and HTMLParsers, I need a way to Parse and to preserve original input OR a way to perform Lexical Analysis on HTML. The existing Parsers dont seem to do it in a straight forward way... Commented Nov 11, 2013 at 16:54

1 Answer 1

2

You can use lxml to perform such a task http://lxml.de/tutorial.html and use XPath to navigate easily trough your html:

from lxml.html import fromstring
my_html = "HTML CONTENT"
root = fromstring(my_html)
nodes_to_process = root.xpath("//_translate")
for node in nodes_to_process:
    lang = node.attrib["attr"]
    translate = AWESOME_TRANSLATE(node.text, lang)
    node.parent.text = translate

I'll leave up to you the implementation of the AWESOME_TRANSLATE function ;)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.