R use variable within regex

Question

Okay - maybe this is a better example. I am looking for guidance/references on how to reference a variable within a regex - not how to build a regex for this data.

How can you use a value from a variable to regex the next variable?

library(plyr)    
library(tm)
library(stringr)
library(gsubfn)

Dataset of velocities

d1$sub <- c("LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50%     COMMON:", "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:", "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40-50)LESS THAN 50% COMMON:")

d1$sub
[1] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50% COMMON:"                        
[2] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:"                        
[3] "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40-    50)LESS THAN 50% COMMON:"

extract sub1

d1$sub1 <- as.character(lapply((strapply(d1$sub,"((?<=LEFT CAROTID STENOSIS:).{5,}?(?=(\\(|COMMON)))", perl=TRUE)), unique))
d1$sub1
[1] " (50-69)APPROXIMATELY 50-55% "                       
[2] " (50-69)APPROXIMATELY 60-70% "                       
[3] " (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES "

Now reference sub1 to get sub2 from the data

Want to return "(0-49)LESS THAN 50%", "(0-49)LESS THAN 50%", And "(40-50)LESS THAN 50%"

d1$sub2 <- as.character(lapply((strapply(d1$sub,"((?<=\\d1$sub1).*?(?=COMMON))", perl=TRUE)), unique))
d1$sub2
[1] "NULL" "NULL" "NULL"

* Original Post Below **

I am extracting medical information from text reports, and am attempting to use one variable ($sub1) as part of a regex to find the next variable ($sub2).

How can you use a value from a variable to regex the next variable?

library(plyr)
library(tm)
library(stringr)
library(gsubfn)

#Dataset of velocities
d1 <- c("CCA: 135 cm/sec ICA: 50 cm/sec", "CCA: 150 cm/sec ICA: 75 cm/sec")
d1
[1] "CCA: 135 cm/sec ICA: 50 cm/sec" "CCA: 150 cm/sec ICA: 75 cm/sec"

#Lookahead to get sub1
d1$sub1 <- as.character(lapply((strapply(d1,"(.*?(?=ICA:))", perl=TRUE)), unique))
Warning message:
In d1$sub1 <- as.character(lapply((strapply(d1, "(.*?(?=ICA:))",  :
 Coercing LHS to a list
d1
[[1]]
[1] "CCA: 135 cm/sec ICA: 50 cm/sec"

[[2]]
[1] "CCA: 150 cm/sec ICA: 75 cm/sec"

$sub1
[1] "CCA: 135 cm/sec " "CCA: 150 cm/sec "

#Now reference sub1 to get sub2 - does not work?
#Want to return "ICA:50 cm/sec" and "ICA:75 cm/sec"
#Used paste(d1$sub1) to try getting the $sub1 variable into the regex, but doesn't work)
d1$sub2 <- as.character(lapply((strapply(d1,"((?<=paste(d1$sub1)).*?)", perl=TRUE)), unique))
d1$sub2
[1] "NULL" "NULL" "NULL"

The text has structure, but is very variable in terms of length, content, etc. Defining the first variable ($sub1) is easy, but using it to define the second variable will be the most precise.

Maybe I should have emphasized that the text is very variable - so a simple regex based on the text pattern will not work. I need to use the first variable to locate the second within the text. It is medical information so I can't post actual data.

Not sure I fully understand your question - the desired final output is 2 variables ($sub1 = CA: 135 cm/sec, $sub2 =CCA: 50cm/sec). I can generate the variables but am struggling with how to reference the first to get a locator for the second. — user2547308
– user2547308, Commented Aug 21, 2013 at 15:39

Carolina Timoteo · Accepted Answer · 2014-05-01 05:03:28Z

6

Try using the paste0() function. That will put together all your variables and any regular expressions you want to use.

grep(paste0("^.*", variable, ".*$"), d1)

you can also add the argument collapse = "" to paste0() if your variable could have >1 element

answered May 1, 2014 at 5:03

Carolina Timoteo

611 silver badge1 bronze badge

Sign up to request clarification or add additional context in comments.

Comments

G. Grothendieck · Accepted Answer · 2013-08-21 15:43:03Z

2

Try this:

> d1 <- c("CCA: 135 cm/sec ICA: 50 cm/sec", "CCA: 150 cm/sec ICA: 75 cm/sec")
> t(strapplyc(d1, "\\w+: \\S+ \\S+", simplify = TRUE))
     [,1]              [,2]            
[1,] "CCA: 135 cm/sec" "ICA: 50 cm/sec"
[2,] "CCA: 150 cm/sec" "ICA: 75 cm/sec"

answered Aug 21, 2013 at 15:43

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:28:55Z

0

You'll need to escape various characters to use variables in regex, but why not do the simpler thing?

sub('(.*)ICA.*', '\\1', d1)
#[1] "CCA: 135 cm/sec " "CCA: 150 cm/sec "
sub('.*(ICA.*)', '\\1', d1)
#[1] "ICA: 50 cm/sec" "ICA: 75 cm/sec"

edited May 23, 2017 at 12:28

CommunityBot

11 silver badge

answered Aug 21, 2013 at 15:40

eddi

49.5k6 gold badges109 silver badges157 bronze badges

Collectives™ on Stack Overflow

R use variable within regex

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related