2

I want to remove numbers (integers and floats) from a character vector, preserving dates:

"I'd like to delete numbers like 84 and 0.5 but not dates like 2015"

I would like to get:

"I'd like to delete numbers like and but not dates like 2015"

In English a quick and dirty rule could be: if the number starts with 18, 19, or 20 and has length 4, don't delete.

I asked the same question in Python and the answer was very satisfying (\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?).

However, when I pass the same regex to grepl in R:

gsub("[\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?]"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015")

I get:

Error: '\d' is an unrecognized escape in character string starting ""\b(?!(?:18|19|20)\d"

4

2 Answers 2

2

As I mentioned in my comments, the main points here are:

  • regex pattern should be placed outside the character class to be treated as a sequence of subpatterns and not as separate symbols inside the class
  • the backslashes must be doubled in R regex patterns (since it uses C strings where \ is used to escape entities like \n, \r, etc)
  • and also you need to use perl=T with patterns featuring lookarounds (you are using lookaheads in yours)

Use

gsub("\\b(?!(?:18|19|20)\\d{2}\\b(?!\\.\\d))\\d*\\.?\\d+\\b"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015", perl=T)

See IDEONE demo.

Sign up to request clarification or add additional context in comments.

Comments

1

To search and replace in R you can use:

gsub("\\b(?!(?:18|19|20)\\p{Nd}{2}\\b(?!\\.\\p{Nd}))\\p{Nd}*\\.?", "replacement_text_here", subject, perl=TRUE);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.