0

I am trying to take in a line from a text file and remove all punctuation, such as commas, periods, single quotes, double quotes, etc. and set the string to be lowercase. The code that I am using is:

inputLine.replaceAll("[^a-zA-Z'\\s]", "").toLowerCase();

Which to my understanding would do this, however it isn't. It also doesn't set words to lowercase either. So I included another line to specifically remove periods and commas:

inputLine.replaceAll("\\.", "");

and then to split each word into a String array:

String[] strings = inputLine.split(" ");

However, I am still ending up with words such as sets, There properties:[1]. Does anyone know why this is happening, or could you provide a solution to this? I have not done much regex work before, so this is all very new to me.

3
  • Give us some samples of what an inputLine contains and what output you're getting. Commented Jul 18, 2013 at 1:03
  • Welcome to SO. Here, you should take the tour. Commented Jul 18, 2013 at 1:09
  • arshajii has pointed out what was wrong, I wasn't re-assigning the string when I was using .replaceAll(..) Commented Jul 18, 2013 at 1:10

1 Answer 1

1

Are you reassigning inputLine? Remember: strings are immutable!

inputLine = inputLine.replaceAll("[^a-zA-Z'\\s]", "").toLowerCase();

By the way you can also use .replaceAll("\\p{Punct}", "") to replace all punctuation.

Sign up to request clarification or add additional context in comments.

1 Comment

I definitely forgot to re-assign inputLine, thanks for pointing that out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.