JAVA SPLIT - split without removing whitespaces

Question

I'm having a difficulty splitting a string without removing whitespaces but removing all other non-characters. I have this school task to read in with BufferedReader and the text consists of lots of characters which even eclipse couldn't show. The elements i read in are in form of element1;element 2; element 3 (Element 4; Element 5 $Element 6 etc.. and one of the delimeters to remove should be ";".

I've tried .split(//W) but this removed all the whitespaces and some elements stayed completely empty although it removed characters well.

Right now i've used .split("[;(),$]") but this does not work properly since there are still characters which i can't recognize..

Peter Lawrey · Accepted Answer · 2014-05-03 12:39:04Z

1

Instead of trying to split on the all the characters you don't want, you could include all the characters you do want. e.g.

String[] words = s.split("[^ a-zA-Z0-9]+");

Note: the ^ means anything but these characters.

BTW: none of the characters are non-characters.

edited May 3, 2014 at 12:39

answered May 3, 2014 at 12:31

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Pshemo Over a year ago

+1 for simple regex. Anyway it could be good idea to exclude all whitespaces from split to prevent potential splitting of element\n10.

charen Over a year ago

This option, aswell as the one below leaves me empty elements into arrays, which i could fix by creating a method to do that (did that before aswell tho, but in the way of creating my code, i deleted it and thought it's not longer neccessery). Or is there an another way to avoid empty elements?

Peter Lawrey Over a year ago

@PshemoI would add the whitespace you expect because there is quite a few and developers don't always think about it.

Peter Lawrey Over a year ago

@charen Added a + to skip empty elements. This won't drop a leading empty element.

Pshemo · Accepted Answer · 2014-05-03 12:53:42Z

0

If you claim that \\W worked fine for you but only problem was that it also split on whitespace then you can use intersection of \\W and \\S which will remove all whitespaces from \\W.

Use split("[\\W&&\\S]+")

Also to remove whitespaces surrounding results like _eleement 3 (where _ represents whitespace) you can surround regex with \\s*. To add support for Unicode in predefined character class just add (?U) flag to regex.

Demo:

String data = "element1;element 2; element 3 (Element 4; Element 5 $Element 6 ";
for (String s:data.split("(?U)\\s*[\\W&&\\S]+\\s*")){
    System.out.println(s);
}

Output:

element1
element 2
element 3
Element 4
Element 5
Element 6

edited May 3, 2014 at 12:53

answered May 3, 2014 at 12:32

Pshemo

125k26 gold badges194 silver badges280 bronze badges

7 Comments

charen Over a year ago

This seems to work fine, but now it seems like //W took away also non-ascii characters (in which my language uses) so if i read in from text file "ä", "ö", "ü" or "ö" it will split from them aswell. Any idea what to add so it would skip these aswell?

charen Over a year ago

Hmm, it still splits from so called non-ascii character. vĆ Exception in thread "main" java.lang.NumberFormatException: For input string: "ga kiire"

Pshemo Over a year ago

NumberFormatException regex does't throw NumberFormatException, it seems that you are trying to parse ga kiire with something like Integer.parseInt or something similar. For now I can only guess what is problem with your data/code. To make your question answerable please include example which could be used to reproduce your problem.

charen Over a year ago

Nono, NumberFormatException is given because it does split from wrong place. Without this one split from the Estonian word "väga kiire" there would be another element in this position which is an integer. Right now the problem causing word is " vĆga kiire" i believe.

Pshemo Over a year ago

I understand that, but without seeing how exactly your data should be split I will not be able to help you. I provided answer which solves your problem as it is written now. As I said earlier, you need to provide example which will let me reproduce your current problem. Post data you are trying to split, expected split result and how it is actually being split so I could see what could cause this behaviour.

|

Collectives™ on Stack Overflow

JAVA SPLIT - split without removing whitespaces

2 Answers 2

4 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related