3

I have searched for an answer for a couple of hours now and still nothing comes close to solving a specific programming predicament. This is neither for school or work. I'm developing an app that needs to perform pre-defined data cleansing tasks based on regular expressions. One specific expression that I'm having issues with is that of removing whitespace characters between a word and a number. Below are example requirements:

word 123           ==> word123
123 word           ==> 123word
world 123 wide     ==> word123wide
world wide 123     ==> world wide123
world wide 123 456 ==> world wide123 456

RegEx lookaround seems to be the right approach but still can't figure out how to apply the expression for phrases having more than 2 word blocks.

Thanks in advance.

1 Answer 1

4

Use a combination of lookarounds and alternance between two Patterns, as such:

//                | preceded by digit
//                |      | one whitespace
//                |      |   | followed by non-digit
//                |      |   |      | OR
//                |      |   |      | | preceded by non-digit
//                |      |   |      | |      | one whitespace
//                |      |   |      | |      |   | followed by digit
String pattern = "(?<=\\d)\\s(?=\\D)|(?<=\\D)\\s(?=\\d)";
// test Strings
String test0 = "word 123";
String test1 = "123 word";
String test2 = "world 123 wide";
String test3 = "world wide 123";
String test4 = "world wide 123 456";
// testing output: replace all found matches
// (e.g. one per String in this case)
// with empty
System.out.println(test0.replaceAll(pattern, ""));
System.out.println(test1.replaceAll(pattern, ""));
System.out.println(test2.replaceAll(pattern, ""));
System.out.println(test3.replaceAll(pattern, ""));
System.out.println(test4.replaceAll(pattern, ""));

Output:

word123
123word
world123wide
world wide123
world wide123 456
Sign up to request clarification or add additional context in comments.

6 Comments

+1 but I would probably use \\s+ rather than just \\s: "whitespace" usually means any number of whitespace characters
@Mena - A true regex wizard
@Bohemian good point. I added "one whitespace" in my comments, but it could easily be replaced by any number thereof :)
Love the "drop-down" helper explanations.
That was quick! Now I know why I can't seem to make it work for phrases having more than 2 word blocks. I was missing the 2nd lookaround expression. Thank you so much.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.