8

I'm working on a method that takes a String of HTML and returns an analogous

 javax.swing.text.html.HTMLDocument

What is the most efficient way of doing this?

The way I'm currently doing this is to use a SAX parser to parse the HTML string. I keep track of when I hit open tags (for example, <i>). When I hit the corresponding close tag (for example, </i>), I apply the italics style to the characters I've hit in between.

This certainly works, but it's not fast enough. Is there a faster way of doing this?

3 Answers 3

11

Agree with mouser but a small correction

Reader stringReader = new StringReader(string);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
htmlKit.read(stringReader, htmlDoc, 0);
Sign up to request clarification or add additional context in comments.

1 Comment

This really worked for me, the selected answer did not.
4

Try to use HtmlEditorKit class. It supports parsing of HTML content that can be read directly from String (e.g. through StringReader). There seems to be an article about how to do this.

Edit: To give an example, basically I think it could be done like this (aftrer the code is executed, htmlDoc should contain the loaded document...):

Reader stringReader = new StringReader(string);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(stringReader, htmlDoc.getReader(0), true);

3 Comments

This looks correct, but doesn't seem to be working. Consider this test case: public void testMakeHTMLDocument() throws Exception { final String hTML = "<html>\n" + "<body>\n" + "\n" + "<h1>My First Heading</h1>\n" + "\n" + "<p>My first paragraph.</p>\n" + "\n" + "</body>\n" + "</html>"; final HTMLDocument htmlDocument = MyHTMLDocumentLoader.makeHTMLDocument(hTML); htmlDocument.dump(System.out); }
It dumps this:<html name=html > <body name=body > <p margin-top=0 resolver=NamedStyle:default {name=default,} name=p > <content name=content > [0,1][ ] <bidi root> <bidi level bidiLevel=0 > [0,1][ ]
I'm a little-bit afraid that this is because of the weakness of HTML support by HTMLEditorKit; according to javadoc, "The default support is provided by this class, which supports HTML version 3.2 (with some extensions), and is migrating toward version 4.0" -- I'm afraid you'll need to handle the tags manually in the callback -- not sure if this is somehow better than your original approach :(
0

You could try to use the HTMLDocument.setOuterHTML method. Simply add a random element and replace it afterwards with your HTML string.

1 Comment

Just don't forget that: 'For this to work correcty, the document must have an HTMLEditorKit.Parser set. This will be the case if the document was created from an HTMLEditorKit via the createDefaultDocument method.'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.