0

I have a long vector of strings that have a certain structure. I would like to combine strings and reveal this structure. An example will clear this one.

chr_vec <- c("Random Title", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Another Random Title", "Start", "erg", "vdf", "vfd", "efw", "Stop",
             "Start", "erg", "vdf", "vfd", "efw", "Stop", "Start", "erg", "vdf", "vfd", "efw", "Stop")

So I have random title but then words between Start - Stop (those included should be combined together. Random titles should be included, so I know which block structure belongs. Result would be something like this:

result <- list("Random Title" = list(c("Start", "dsf", "sdvf", "Stop"), c("Start", "dsf", "sdvf", "Stop")),
+                "Another Random Title" = list(c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop")))
> result
$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

I'm not sure how many strings there are between START- STOP. Titles are random. My data format doesn't need to be vector. I tried this via tibble and cumsum, but that fails because there are those titles that I need.

My effort:

res <- tibble(text = chr_vec) %>% 
  mutate(group = cumsum(text == "Start"))

This almost works, but those titles are messing this approach. They will be wrongly identified.

2
  • If the titles are random how will you identify them? What if a title is "efw"? Commented Nov 8, 2021 at 13:53
  • Well I hoped that as Title is always in this sequence STOP - TITLE - START, that it would be noticed that there is secondary list starting. So if we can first combine everything between START - STOP, then we could combare maybe lentghs and see that these are actually titles that reflect everyhitng behind it before new title. As title is never between START-STOP sequence. Commented Nov 8, 2021 at 13:55

1 Answer 1

1

A solution in base R

t1=grep("Start",chr_vec)
t2=grep("Stop",chr_vec)
sek=mapply(seq,t1,t2)

j=1
lst=list()
for (i in 1:length(sek)) {
  
  if (i==1) {
    tit=chr_vec[1]
  } else {
    if ((head(sek[[i]],1)-tail(sek[[i-1]],1))!=1) {
      tit=chr_vec[head(sek[[i]],1)-1]
      j=1
    }
  }
  
  lst[[tit]][[j]]=chr_vec[sek[[i]]]
  j=j+1
}

resulting in

$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[3]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.