Java JTextPane HTML Editor UTF-8 characters encoding

Question

I'm using JTextPane as simple html editor.

jtp=new JTextPane();
jtp.setContentType("text/html;charset=UTF-8");
jtp.setEditorKit(new HTMLEditorKit());

When I call jtp.getText() I get nice html code with all special chars escaped. But I don't want escape national characters (polish) but only special html chars like &, <, > When I enter in editor

<foo>ą ś &

I get

&lt;foo&gt;&#261; &#347; &amp;

but I would like get

&lt;foo&gt;ą ś &amp;

How it is possile?

hmmm are you take these data from File or from WWW ???, because then you have to encode Buffer with proper Charset to the String value — mKorbel
– mKorbel, Commented Nov 30, 2011 at 11:49

Oleg Mikheev · Accepted Answer · 2011-11-30 13:17:27Z

4

That's not possible, unfortunately.

There's a flaw inside javax.swing.text.html.HTMLWriter -- it is hardcoded to convert any symbol that is not ASCII to its numeric representation:

default:
    if (chars[counter] < ' ' || chars[counter] > 127) {
        if (counter > last) {
            super.output(chars, last, counter - last);
        }
        last = counter + 1;
        // If the character is outside of ascii, write the
        // numeric value.
        output("&#");
        output(String.valueOf((int)chars[counter]));
        output(";");
    }
    break;
}

This logic cannot be controlled in any way.

BUT If you really really need that functionality you could do the crazy stuff:

copy and paste HTMLWriter sources into HTMLWriterHack (in the same package javax.swing.text.html and renaming all strings inside)
Replace the above listed three output lines with something like output(String.valueOf(chars[counter]));
copy and paste HTMLDocument sources into HTMLDocumentHack (in the same package javax.swing.text.html, renaming all strings inside, making it extend HTMLDocument and removing clashing methods)
Use the CustomEditorKit listed below instead of HTMLEditorKit

class CustomEditorKit extends HTMLEditorKit {
    @Override
    public void write(Writer out, Document doc, int pos, int len) throws IOException, BadLocationException {
        HTMLWriterHack writer = new HTMLWriterHack(out, (HTMLDocumentHack) doc);
        writer.write();
    }
    @Override
    public Document createDefaultDocument() {
        StyleSheet styles = getStyleSheet();
        StyleSheet ss = new StyleSheet();
        ss.addStyleSheet(styles);
        HTMLDocumentHack doc = new HTMLDocumentHack(ss);
        doc.setParser(getParser());
        doc.setAsynchronousLoadPriority(4);
        doc.setTokenThreshold(100);
        return doc;
    }
}

Although the steps above work (I tested it), I certainly wouldn't recommend doing that.

answered Nov 30, 2011 at 13:17

Oleg Mikheev

17.5k16 gold badges77 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2889419 Over a year ago

Bro where do I supposed to find the HTMLDocumentHack?

user2889419 Over a year ago

Bro, you solved my problem too, thank you so much, I have been stuck for 2 weeks, thanks dude.

Joop Eggen · Accepted Answer · 2011-11-30 13:07:31Z

0

It is not possible, all characters above code 127 are translated to a numeric entity & # number ;. The HTML-entities are translated into named entities & lt ; , and so on. So you may easily resubstitute them. (This is done in HTMLWriter.output, and there seems to be no provision for character sets whatsoever.)

answered Nov 30, 2011 at 13:07

Joop Eggen

110k8 gold badges89 silver badges142 bronze badges

2 Comments

karolkpl Over a year ago

So I can't distinguish html-entities from non html-entities? So far I came with pattern (&#[0-9]+;) and then StringEscapeUtils.unescapeHtml4($1). It's seems working

Joop Eggen Over a year ago

You did right, I meant that by your parsing you leave ", < and > untouched, as they are named entities like & quot ; .

Collectives™ on Stack Overflow

Java JTextPane HTML Editor UTF-8 characters encoding

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related