R Combining strings between known strings

Question

I have a long vector of strings that have a certain structure. I would like to combine strings and reveal this structure. An example will clear this one.

chr_vec <- c("Random Title", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Another Random Title", "Start", "erg", "vdf", "vfd", "efw", "Stop",
             "Start", "erg", "vdf", "vfd", "efw", "Stop", "Start", "erg", "vdf", "vfd", "efw", "Stop")

So I have random title but then words between Start - Stop (those included should be combined together. Random titles should be included, so I know which block structure belongs. Result would be something like this:

result <- list("Random Title" = list(c("Start", "dsf", "sdvf", "Stop"), c("Start", "dsf", "sdvf", "Stop")),
+                "Another Random Title" = list(c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop")))
> result
$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"

I'm not sure how many strings there are between START- STOP. Titles are random. My data format doesn't need to be vector. I tried this via tibble and cumsum, but that fails because there are those titles that I need.

My effort:

res <- tibble(text = chr_vec) %>% 
  mutate(group = cumsum(text == "Start"))

This almost works, but those titles are messing this approach. They will be wrongly identified.

If the titles are random how will you identify them? What if a title is "efw"? — user2974951
– user2974951, Commented Nov 8, 2021 at 13:53
Well I hoped that as Title is always in this sequence STOP - TITLE - START, that it would be noticed that there is secondary list starting. So if we can first combine everything between START - STOP, then we could combare maybe lentghs and see that these are actually titles that reflect everyhitng behind it before new title. As title is never between START-STOP sequence. — Hakki
– Hakki, Commented Nov 8, 2021 at 13:55

user2974951 · Accepted Answer · 2021-11-08 14:12:07Z

1

A solution in base R

t1=grep("Start",chr_vec)
t2=grep("Stop",chr_vec)
sek=mapply(seq,t1,t2)

j=1
lst=list()
for (i in 1:length(sek)) {
  
  if (i==1) {
    tit=chr_vec[1]
  } else {
    if ((head(sek[[i]],1)-tail(sek[[i-1]],1))!=1) {
      tit=chr_vec[head(sek[[i]],1)-1]
      j=1
    }
  }
  
  lst[[tit]][[j]]=chr_vec[sek[[i]]]
  j=j+1
}

resulting in

$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[3]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"

edited Nov 8, 2021 at 14:12

answered Nov 8, 2021 at 14:06

user2974951

10.4k2 gold badges21 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R Combining strings between known strings

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related