16

I have a long string. The part is

x <- "Text1 q10_1 text2 q17 text3 q22_5 ..."

How can I subtract 1 from each number after "q" letter to obtain the following?

y <- "Text1 q9_1 text2 q16 text3 q21_5 ..."

I can extract all my numbers from x:

numbers <- stringr::str_extract_all(x, "(?<=q)\\d+")
numbers <- as.integer(numbers[[1]]) - 1

But how can I update x with these new numbers?

The following is not working

stringr::str_replace_all(x, "(?<=q)\\d+", as.character(numbers))

5 Answers 5

28

I learned today that stringr::str_replace_all will take a function:

stringr::str_replace_all(
  x, 
  "(?<=q)\\d+", 
  \(x) as.character(as.integer(x) - 1)
)
Sign up to request clarification or add additional context in comments.

1 Comment

Yes it's nice isn't it? JavaScript and Python support this too. If you're interested in how this is done in other languages it's covered in answers to this question: stackoverflow.com/questions/26171318/…
18

We can use gregexpr and regmatches for this:

x <- "Text1 q10_1 text2 q17 text3 q22_5 ..."
gre <- gregexpr("(?<=q)[0-9]+", x, perl = TRUE)
regmatches(x, gre)
# [[1]]
# [1] "10" "17" "22"
regmatches(x, gre) <- lapply(regmatches(x, gre), function(z) as.integer(z) - 1L)
x
# [1] "Text1 q9_1 text2 q16 text3 q21_5 ..."

Comments

9

Here is an alternative approach using gsubfn function from gsubfn package:

  1. gsubfn matches "q" followed by one or more digits.
  2. The digits are captured as a group.
  3. Each captured group is passed to the inline function ~ paste0("q", as.numeric(x) - 1) which converts the captured digits into numeric values and subtracts 1, and then concatenates it back.
#install.packages("gsubfn")
library(gsubfn)

gsubfn("q(\\d+)", ~ paste0("q", as.numeric(x) - 1), x)
"Text1 q9_1 text2 q16 text3 q21_5 ..."

1 Comment

That package never ceases to amaze me, nice!
0

Here is a rather convoluted solution that should still work:

library(dplyr)
library(stringr)
library(magrittr)
library(tidyr)
x <- "Text1 q10_1 text2 q17 text3 q22_5 ..."

x <- data.frame(x, stringsAsFactors = FALSE) #Convert to data.frame

x %<>% 
  separate_longer_delim(cols = everything(), " ") %>% #Split the string into separate rows.
  mutate(s = as.numeric(str_extract(x, "(?<=q)[0-9]+"))-1) %>% #Extract the value and subtract 1 from it.
  mutate(x1 = str_replace(x, "(?<=q)[0-9]+", as.character(s))) %>% #Replace the value with this new value.
  select(x1) %>% #Retain only data necessary to produce the output.
  mutate(x1 = paste0(x1, collapse = " ")) %>% #Collapse this into a string.
  distinct(x1) #Discard superfluous copies of the string.

x <- x$x1
x
# [1] "Text1 q9_1 text2 q16 text3 q21_5 ..."

Comments

0

here is a python version that uses logic in place of regular expressions. It can probably be cleaned up a bit...

x = "Text1 q10_1 text2 q17 text3 q22_5 ..."


list_x = x.split()
list_y = []

for y in list_x:
    if y[0] == 'q':
        
        z = y.find('_')
        if z == -1: 
            z = None
            end = ''
        else: end = y[z:] 
        r = int(y[1:z])
        r = r-1
        r_string = 'q' + str(r) + str(end)
        
    else:
        r_string = y
    list_y.append(r_string)

final = ' '.join(list_y)
print(final)

The result is:

Text1 q9_1 text2 q16 text3 q21_5 ...

Having shown the above, regex is a more suitable method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.