1

I have this text: Retailer-ul Amazon foloseste metode severe pentru a-si descuraja etc. angajatii din depozite sa nu mai fure din produse. Pe ecrane li se arata siluete de angajati care au furat produse, li se spune ce au furat si cat valorau produsele, aparand si mentiunea "arestat" sau "concediat", scrie Bloomberg. Unii spun ca... and so on and I am trying to replace all strings that are abbreviations inside a fraze, so for example etc. is an abbreviation because it's following word angajatii starts with a lowercase letter, as opposed to produse. which is the end of the fraze because it's following word Pe starts with a uppercase letter and I don't want to remove it.

I have this code $subject = preg_replace('~\b[a-z]+\.\s[a-z]~', '', $subject); which matches every abbreviation with a . after it and a space (\s) and then a lowercase letter [a-z] (eg. descuraja etc. angajatii turns into descuraja ngajatii instead of descuraja angajatii). I don't want to replace the lowercase letter of its following word. I somehow can't avoid it being replaced. How can I still keep the same matching pattern but replace only the abbreviation and the dot and the whitespace after it? Thank you.

2
  • 1
    \b[a-z]+\.\s(?=[a-z])? Commented Mar 10, 2016 at 9:38
  • @WiktorStribiżew dude, you rock, thanks, it works, can you please submit it as an answer and briefly explain why adding (?=) works? Commented Mar 10, 2016 at 9:39

1 Answer 1

5

You need to wrap the [a-z] into a positive lookahead:

\b[a-z]+\.\s(?=[a-z])

See the regex demo

The lookahead construct just checks if some pattern defined inside it appears to the right of the current location. So, (?=[a-z]) checks if there is a lowercase ASCII letter right after the whitespace matched with \s. If there is a lowercase, a match is returned (and the replacement occurs), if it does not find the small letter, the match is failed, no replacement occurs.

Sign up to request clarification or add additional context in comments.

2 Comments

amazing, so by adding (?=[a-z]) it matches it but doesn't replace it ? I will accept your answer in 7 minutes.
Yes, it is called non-consuming construct, or a zero-width assertion. Checking (thus, matching), but not consuming (=not putting the matched substring into the match value). Note that capturing inside a positive lookaround is possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.