0

I have an XML file which describes an odt file content. I would replace char sequences which give problems with bullet operators and black square using Java.

In particular, given the following XML element:

<text:list-level-style-bullet text:level="1" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="â?¢">
            <style:list-level-properties text:space-before="0.25in" text:min-label-width="0.25in" />
</text:list-level-style-bullet>

I want to replace with a regex the â?¢ sequence with the bullet operator.

Similarly, given

<text:list-level-style-bullet text:level="2" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="â?£">
            <style:list-level-properties text:space-before="0.5in" text:min-label-width="0.25in" />
</text:list-level-style-bullet>

I want to replace with a regex the â?£ sequence with white bullet.

At last, similarly to the previous case I want to replace with a regular expression the â?? sequence with black square.

So, given the file content in a String (result variable), I tried to use the replaceAll method with regex as follow:

result = result.replaceAll("â[?]¢", "\u2219");
result = result.replaceAll("â[?]£", "\u25E6");
result = result.replaceAll("â([?]){2}", "\u25A0");

This code has no effect, the regular expression match correctly the part of the text which I want (I try with Rubular), and I specify the specific unicode for the symbol that I want to use to substitute.

Can someone help me? Thank you.

4
  • replace does not accept a regex, but a simple string. Use replaceAll Commented Feb 8, 2016 at 10:35
  • @WiktorStribiżew Sorry. I copied the wrong portion of code, now I'm using this result = result.replaceAll("â[?]¢", "\u2219"); result = result.replaceAll("â[?]£", "\u25E6"); result = result.replaceAll("â([?]){2}", "\u25A0"); but nothing to do. Commented Feb 8, 2016 at 10:38
  • It works, check the demo. Commented Feb 8, 2016 at 10:41
  • Finally I solved. The problem was the encoding, after to update the string content of the file, I tried to save it to a temporary file: the replacing with regex works correctly but when I wrote the file the new character were encoded wrong. So I use the ISO-8859-15 encoding intead UTF-8 and a OutputStreamWriter. Commented Feb 9, 2016 at 8:20

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.