0

I have a character vector and a data.tabe:

library(data.table)
pselection <- c("12345" , "2345", "12345678")
dt <- data.table("title"=c("First title", "Second Title", "Third Title", "Fourth Title"), 
                 "sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444"))

Now I want to select all rows of the data.table which match the sha column partially based on the ; seperator. So basically I want this output:

          title                sha
1:  First title              12345
2: Second Title 2345; 66543; 33423
3:  Third Title   22222; 12345678;

How would I do this?

I tried this:

selected <- dt[sha %in% pselection]

but it only selects exact matches and using the %like% expression is just for matching one expression not many. Concatenating to a regular expression (like paste(pselection, collapse="|")) is out of the question because my pselection is > 10.000.Thanks for the help in advance!

2
  • Do you need to use data.table or are you just looking for a solution for selecting partial string matches? Commented May 8, 2020 at 15:49
  • Would be nice to use data.table but any efficient enough solution is appreciated! Commented May 8, 2020 at 15:54

2 Answers 2

1

I have a solution in mind using lapply and tstrsplit. There's probably more elegant but it does the job

lapply(1:nrow(dt), function(i) {
  dt[i,'match' := any(trimws(tstrsplit(as.character(dt[i,'sha']),";")) %in% pselection)]
  })

dt[(match)]
          title                sha match
1:  First title              12345  TRUE
2: Second Title 2345; 66543; 33423  TRUE
3:  Third Title   22222; 12345678;  TRUE

The idea is to split every row of sha column (trim whitespace otherwise row 3 will not match) and check if any sha appears

Sign up to request clarification or add additional context in comments.

Comments

1

Using regex:

pselection <- paste0("\\b", pselection) # \b is boundary and includes ; and whitespace
dt[grepl(paste(pselection, collapse = "|"), sha)]

          title                sha
1:  First title              12345
2: Second Title 2345; 66543; 33423
3:  Third Title   22222; 12345678;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.