1

I have a condition where I have to replace some character(special, non-print-able and other special character) from string as mention below

 private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
    private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
    private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";

stringValue.replaceAll(NON_ASCII_CHARACTERS, "").replaceAll(ASCII_CONTROL_CHARACTERS, "")
                .replaceAll(NON_PRINTABLE_CHARACTERS, "");
            

can we refactor above code means we can use single "replaceAll" method and put all conditions inside?

is there any way please advice.

4
  • 2
    You can separate regular expressions with | to do an "or". Commented May 19, 2022 at 0:07
  • @DawoodibnKareem Could you please give an example? Commented May 19, 2022 at 0:16
  • You mean an example like "re1|re2"? Is this really such a mystery? Commented May 19, 2022 at 0:56
  • 1
    The way you combine the patterns, seems to contradict your goals. The definition of ASCII_CONTROL_CHARACTERS implies that you want to keep tabs and line breaks, but NON_PRINTABLE_CHARACTERS includes them, so you end up removing them. In fact, tabs and line breaks are the only non printable characters left on the third replace operation. I think, you are better off thinking first, what you actually want to keep, which is only a small deviation from your first pattern, i.e. stringValue.replaceAll("[^\r\n\t\\x20-\\x7F]", "") and that’s it. Commented May 19, 2022 at 8:28

3 Answers 3

5

You can use regex or operator |

private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";

public static String process(String stringValue) {
    return stringValue.replaceAll(NON_ASCII_CHARACTERS + "|"+ ASCII_CONTROL_CHARACTERS +"|"+ NON_PRINTABLE_CHARACTERS, "");
}

public static void main(String[] args) {
    String val = process("A9339a0zzz]3");
    System.out.println(val);
}
Sign up to request clarification or add additional context in comments.

Comments

1

According to the Pattern javadocs, it should also be possible to combine the three character class patterns into a single character class:

private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";

becomes

private static final String COMBINED =
  "[[^\\x00-\\x7F][\\p{Cntrl}&&[^\r\n\t]]\\p{C}]";

or

private static final String COMBINED =
    "[" + NON_ASCII_CHARACTERS + ASCII_CONTROL_CHARACTERS 
        + NON_PRINTABLE_CHARACTERS + "]";

Note that && (intersection) has lower precedence than the implicit union operator so all of the [ and ] meta-characters in the above are required.

You decide which version you think is clearer. It is a matter of opinion.

Comments

1

Code point

You might consider an alternate avenue, other than using regex. You can use the code point integer number for each character, and query Character class for the category of character.

String input = … ;
String output = 
    input
    .codePoints()  // Returns an `IntStream` of code point `int` values.
    .filter( codePoint -> ! Character.isISOControl( codePoint ) )  // Filter for the characters you want to keep. Those code points flunking the `Predicate` test will be omitted. 
    .filter( codePoint -> codePoint < 127 ) ;  // Within US-ASCII range. Code point 127 is US-ASCII but is DEL, so we filter that out here. 
    .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )  // Convert the `int` code point integers back into characters. 
    .toString() ;  // Make a `String` from the contents of the `StringBuilder`. 

The Character class has many of the classifications defined by the Unicode Consortium. You can use them to narrow down the stream of code points to those which represent your desired characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.