I am trying to replace all instances of sentence terminators such as '.', '?', and '!', but I do not want to replace strings like "dr." and "mr.".
I have tried the following:
text = text.replaceAll("(?![mr|mrs|ms|dr])(\\s*[\\.\\?\\!]\\s*)", "\n");
...but that does not seem to work. Any suggestions would be appreciated.
Edit: After the feedback here and a bit of tweeking this is the working solution to my problem.
private String convertText(String text) {
text = text.replaceAll("\\s+", " ");
text = text.replaceAll("[\n\r\\(\\)\"\\,\\:]", "");
text = text.replaceAll("(?i)(?<!dr|mr|mrs|ms|jr|sr|\\s\\w)(\\s*[\\.\\?\\!\\;](?:\\s+|$))","\r\n");
return text.trim();
}
The code will extract all* compound and single sentences from an excerpt of text, removing all punctuation and extraneous white-space.
*There are some exceptions...
[], from around the list of exceptions:(?!mr|mrs|ms|dr). They stand for "character set", not "full strings" as you're using them. Don't know if it will entirely solve your problem, but it's worth a startJ. H. Ronaldo says that the train is running on time.... Is he right?.