I'm trying to process a chunk of text in PHP to remove word wrapping. Think of this as a reverse wordwrap function that affects only lines that were broken in the middle, but keeps line breaks at the end of paragraphs. Original content is in plain text format.
This is an example of the original content:
The quick brown fox jumps
over the lazy dog. Foxes
are orange and dogs are blue.
A blue bird appeared on the
window, singing jolly songs.
It should be converted to this:
The quick brown fox jumps over the lazy dog. Foxes are orange and dogs are blue.
A blue bird appeared on the window, singing jolly songs.
My logic is to create a list of accepted end of line characters, like period, colon and semicolon, and remove any breaks from lines not ending with those characters. I think it works, but I'm having a hard time translating it into a regex. Any help would be appreciated.
My progress so far:
$content = preg_replace("/(?<!\.)$/m", "XXXX", $content);
This matches any line not ending with a period. I'd still have to include the line break on the match and any white space after the period. I think that I also need to create a group to also match other line ending characters, line colons and semicolons. Having a hard time putting it all together.