3

I'm using JTextPane as simple html editor.

jtp=new JTextPane();
jtp.setContentType("text/html;charset=UTF-8");
jtp.setEditorKit(new HTMLEditorKit());

When I call jtp.getText() I get nice html code with all special chars escaped. But I don't want escape national characters (polish) but only special html chars like &, <, > When I enter in editor

<foo>ą ś &

I get

&lt;foo&gt;&#261; &#347; &amp;

but I would like get

&lt;foo&gt;ą ś &amp;

How it is possile?

2
  • I use charset=cp1251 instead of charset=UTF-8 Commented Nov 30, 2011 at 11:32
  • hmmm are you take these data from File or from WWW ???, because then you have to encode Buffer with proper Charset to the String value Commented Nov 30, 2011 at 11:49

2 Answers 2

4

That's not possible, unfortunately.

There's a flaw inside javax.swing.text.html.HTMLWriter -- it is hardcoded to convert any symbol that is not ASCII to its numeric representation:

default:
    if (chars[counter] < ' ' || chars[counter] > 127) {
        if (counter > last) {
            super.output(chars, last, counter - last);
        }
        last = counter + 1;
        // If the character is outside of ascii, write the
        // numeric value.
        output("&#");
        output(String.valueOf((int)chars[counter]));
        output(";");
    }
    break;
}

This logic cannot be controlled in any way.

BUT If you really really need that functionality you could do the crazy stuff:

  1. copy and paste HTMLWriter sources into HTMLWriterHack (in the same package javax.swing.text.html and renaming all strings inside)
  2. Replace the above listed three output lines with something like output(String.valueOf(chars[counter]));
  3. copy and paste HTMLDocument sources into HTMLDocumentHack (in the same package javax.swing.text.html, renaming all strings inside, making it extend HTMLDocument and removing clashing methods)
  4. Use the CustomEditorKit listed below instead of HTMLEditorKit

class CustomEditorKit extends HTMLEditorKit {
    @Override
    public void write(Writer out, Document doc, int pos, int len) throws IOException, BadLocationException {
        HTMLWriterHack writer = new HTMLWriterHack(out, (HTMLDocumentHack) doc);
        writer.write();
    }
    @Override
    public Document createDefaultDocument() {
        StyleSheet styles = getStyleSheet();
        StyleSheet ss = new StyleSheet();
        ss.addStyleSheet(styles);
        HTMLDocumentHack doc = new HTMLDocumentHack(ss);
        doc.setParser(getParser());
        doc.setAsynchronousLoadPriority(4);
        doc.setTokenThreshold(100);
        return doc;
    }
}

Although the steps above work (I tested it), I certainly wouldn't recommend doing that.

Sign up to request clarification or add additional context in comments.

2 Comments

Bro where do I supposed to find the HTMLDocumentHack?
Bro, you solved my problem too, thank you so much, I have been stuck for 2 weeks, thanks dude.
0

It is not possible, all characters above code 127 are translated to a numeric entity & # number ;. The HTML-entities are translated into named entities & lt ; , and so on. So you may easily resubstitute them. (This is done in HTMLWriter.output, and there seems to be no provision for character sets whatsoever.)

2 Comments

So I can't distinguish html-entities from non html-entities? So far I came with pattern (&#[0-9]+;) and then StringEscapeUtils.unescapeHtml4($1). It's seems working
You did right, I meant that by your parsing you leave ", < and > untouched, as they are named entities like & quot ; .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.