5

im trying to figure out the regex to use to split an essay into words WITHOUT punctuation. I tried splitting by whitespace, but that gives some tokens with the punctuation. I also tried to split by word chars, which returned an array of empty strings for some reason:

String[] words = line.split("\\w+");
7
  • Here's the summary of regular expressions constructs. Build your split regex from the appropriate constructs. Commented Mar 16, 2014 at 5:51
  • i read that, since i want to match words, i want to use one or more word characters([a-zA-Z0-9]). once it encounters non word chars it wont match anymore, so I thought that was the regex I needed. I dont see what is wrong with my reasoning for choosing this regex Commented Mar 16, 2014 at 5:54
  • In @SotiriosDelimanolis' link to reg ex constructs look for word boundary \b. Then please delete this post. There are answers to this question all over the internet including SO. Commented Mar 16, 2014 at 5:54
  • possible duplicate of Java regular expression word match Commented Mar 16, 2014 at 5:55
  • You should also read the javadoc of split. Commented Mar 16, 2014 at 5:55

1 Answer 1

4

try this

String[] words = line.split("\\W+");
Sign up to request clarification or add additional context in comments.

1 Comment

funny. I knew about \W in the back of my head and thought to use \s+ first, then got confused and messed up the definition of split and used \w+, and didn't realize to use the opposite of that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.