I want to remove numbers (integers and floats) from a character vector, preserving dates:
"I'd like to delete numbers like 84 and 0.5 but not dates like 2015"
I would like to get:
"I'd like to delete numbers like and but not dates like 2015"
In English a quick and dirty rule could be: if the number starts with 18, 19, or 20 and has length 4, don't delete.
I asked the same question in Python and the answer was very satisfying (\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?).
However, when I pass the same regex to grepl in R:
gsub("[\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?]"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015")
I get:
Error: '\d' is an unrecognized escape in character string starting ""\b(?!(?:18|19|20)\d"
gsub, you need to double backslashes. And you should not put all into a character class[...]. Also, the lookahead requires the use ofperl=T.[].gsub("\\b(?!(?:18|19|20)\\d{2}\\b(?!\\.\\d))\\d*\\.?\\d+\\b"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015", perl=T).