Java replace regex before and after a period

Question

I am working with XML on an android app that sometimes leaves sentences bumped up against each other.

Like: First sentence.Another sentence

I know I need to use [a-z] (lowercase letters), [A-Z] (uppercase letters), and all digits ([0-9]?) to search before and after the period, and then add a space after the period.

Maybe something like:

myString = myString.replaceAll("(\\p{Ll})(\\p{Lu})", "$1 $2");

My searches and efforts have been useless so far, so any and all help is welcomed. Thanks

Couldn't you come up with a better title than I can not find this regex? — devnull
– devnull, Commented Feb 24, 2014 at 7:29
Your title sounds like you've lost your regex, and you need help finding it. — user2357112
– user2357112, Commented Feb 24, 2014 at 7:31
Never parse XML with regex.XML is not a regular language.Use well known XML parsers instead.See this question : stackoverflow.com/questions/8577060/… — Madusudanan
– Madusudanan, Commented Feb 24, 2014 at 7:33
at the time of me making edits to XML, it is already a well formatted string — Dustin
– Dustin, Commented Feb 24, 2014 at 7:34
At what point are these sentences stuck together without a space? Does the XML itself have sentences joined improperly, with no spaces or tags between them? — user2357112
– user2357112, Commented Feb 24, 2014 at 7:35

Tim Pietzcker · Accepted Answer · 2014-02-24 07:36:13Z

3

You were almost there, you just forgot to match the dot:

myString = myString.replaceAll("(\\p{Ll})\\.(\\p{Lu})", "$1. $2");

And since you're not actually doing anything with the letter before and after the dot, you can speed things up a bit by using lookaround assertions:

myString = myString.replaceAll("(?<=\\p{Ll})\\.(?=\\p{Lu})", ". ");

answered Feb 24, 2014 at 7:36

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user2357112 Over a year ago

Of course, now we're putting extra spaces into acronyms written with periods. We could try to tell whether we're looking at an acronym, but then we run into more edge cases. Natural language correction is messy.

Dustin Over a year ago

yes, but this is still missing the fact that it could be a number, lowercase letter, or uppercase letter before and after the period.

Dustin Over a year ago

I know this is a messy thing to edit... but there will be very very few of these cases I think

Tim Pietzcker Over a year ago

If you also want to replace dots after uppercase letters and digits, just use [\\p{L}\\d] instead of \\p{Ll}, but then you'd also replace C.I.A. with C. I. A..

user2357112 Over a year ago

@TimPietzcker: Didn't see that the lookarounds were specifically lowercase and uppercase. It means we're missing different weird edge cases, but C.I.A. is currently fine.

|

Collectives™ on Stack Overflow

Java replace regex before and after a period

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related