0

I have a simple code:

    public static void main(String[] args) {
        String s = "He is a very very good boy, isn't he?"
        String[] words = s.split("[\\s\\-\\.\\'\\?\\,\\_\\@\\!]");
        System.out.println(words.length);
        for(int i = 0; i<words.length; i++) {
            System.out.println(words[i]);
        }
        scan.close();
    }

that should output me this:

10
He
is
a
very
very
good
boy
isn
t
he

But instead, it prints me out this:

11
He
is
a
very
very
good
boy

isn
t
he

Can anyone suggest me how to fix this issue? I know that the problem is when my program encounter "," it automatically splits the string and then again followed by " " it splits it again so it creates a empty line in my output, but i have no idea how to fix it so it will split multiple delimiters at the same time.

3 Answers 3

2

First, although characters with special meaning in regex like ? and . in general has to be escaped, they don't need to be escaped when they are in a character class, [].

So your split call is equivalent to:

String[] words = s.split("[\\s\\-.'?,_@!]");

Only - needs to be escaped because it means "to" in a character class.

Essentially you want it to treat ", " as one delimiter. To match one or more characters, you should use the + quantifier:

String[] words = s.split("[\\s\\-.'?,_@!]+");

Here you are saying that a delimiter is at least one of those characters in the character class.

Here are some visualisations to see what characters are matched, compare:

Sign up to request clarification or add additional context in comments.

6 Comments

Okay thank you it works as it should now :D But what does escaped mean?
In your regex, you have used lots of backslashes haven't you? Why? @JD_KX
No idea what i used them for to be honest, I just seen somewhere in other people stackoverflow posts that they put them and it works so i did :P
@JD_KX Okay then. Backlashes escape the character after them. This means that whatever is after a backslash loses its original meaning. For example, . originally means "any non-new-line character", but if you write a backslash before it, it means "the dot character". However, in character classes, many characters with special meanings are automatically escaped, so you don't need backslashes. Many people put backlashes in anyway (as a habitual thing?).
r u there? Hey can u help me with thing that makes my program work a bit not as expected? To be exact, when i assign to my string a value "" (string is empty but it does allocate a value in array), my system.out.println(words.length) returns 1 but i want it to return 0 because this program does not split anything, how do i fix that?
|
1

Try it this way.

  • Replace all the characters you don't want with spaces
  • Then split on one or more spaces.
String s = "He is a very very good boy, isn't he?";
String[] words = s.replaceAll("[\\W]"," ").split("\\s+");
System.out.println(words.length);
for(int i = 0; i<words.length; i++) {
   System.out.println(words[i]);
}

Or just use split on non words

String[] words = s.split("\\W+");

Comments

1
public static void main(String[] args) {
    String s = "He is a very very good boy, isn't he?";
    String[] words = s.split("([\\s\\-.\\'\\?\\,\\_\\@\\!])+");
    System.out.println(words.length);
    for (String word : words) {
        System.out.println(word);
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.