1

I want to extract a pattern of phrases from the following sentences.

text1 <- "On a year-on-year basis, the number of subscribers of Netflix increased 1.15% in November last year."

text2 <- "There is no confirmed audited number of subscribers in the Netflix's earnings report."

text3 <- "Netflix's unaudited number of subscribers has grown more than 1.50% at the last quarter."

The pattern is number of subscribers or audited number of subscribers or unaudited number of subscribers.

I am using the following pattern \\bnumber\\s+of\\s+subscribers?\\b from a previous problem (Thanks to @wiktor-stribiżew) and then extracting the phrases.

find_words <- function(text){
  
  pattern <- "\\bnumber\\s+of\\s+subscribers?\\b" # something like this

  str_extract(text, pattern)

}

However, this extracts the exact number of subscriber not the other patterns.

Desired output:

find_words(text1)

'number of subscribers'

find_words(text2)

'audited number of subscribers'

find_words(text3)

'unaudited number of subscribers'

1 Answer 1

4

See if this works

find_words <- function(text){

pattern <- "(audited |unaudited )?number\\s+of\\s+subscribers"

str_extract(text, pattern)

}

You can test it with the sample texts you provided:

find_words(text1)
# 'number of subscribers'
find_words(text2)
# 'audited number of subscribers'
find_words(text3)
# 'unaudited number of subscribers'
Sign up to request clarification or add additional context in comments.

1 Comment

Alternative version: "(un)?(audited )?number\\s+of\\s+subscribers?\\b"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.