3

My issue consists of splitting a path, getting all the subpaths till the next "$" ocurrence (a kind of cumulative subpath) and generating a new variable for each of them.

Doing it step by step I get the desired output:

data<-data.frame(path=c("A/A/$/B/$/A/$","B/C/$","B/C/$/C/$/A/B/$"),stringsAsFactors=FALSE)
library(stringr)
data$tr<-str_count(data$path,"\\$")
data$tr_1<-substr(sapply(strsplit(data$path, "\\$"), `[[`, 1),1,nchar(sapply(strsplit(data$path, "\\$"), `[[`, 1))-1)
data$tr_2<-ifelse(is.na(sapply(strsplit(data$path, "\\$"), `[`, 2))==TRUE,
                  "",
                  paste0(data$tr_1,substr(sapply(strsplit(data$path, "\\$"), `[`, 2),1,nchar(sapply(strsplit(data$path, "\\$"), `[`, 2))-1)))
data$tr_3<-ifelse(is.na(sapply(strsplit(data$path, "\\$"), `[`, 3))==TRUE,
                  "",
                  paste0(data$tr_2,substr(sapply(strsplit(data$path, "\\$"), `[`, 3),1,nchar(sapply(strsplit(data$path, "\\$"), `[`, 3))-1))) 

Doing it manually:

Trying to do the same in a loop according to Creating new named variable in dataframe using loop and naming convention, the output fails.

data<-data[,-c(4,5)]
for (i in 2:max(data$tr)) {
  data[[paste0("tr_",i)]]<-ifelse(is.na(sapply(strsplit(data$path, "\\$"), `[`, i))==TRUE,
                  "",
                  paste0(data$tr_i-1,substr(sapply(strsplit(data$path, "\\$"), `[`, i),1,nchar(sapply(strsplit(data$path, "\\$"), `[`, i))-1)))
}

Doing it in a loop:

Is there another way of doing it recursively? (each new variable uses the previous one).
Thanks in advance!

1 Answer 1

1

I'd do this:

data<-data.frame(path=c("A/A/$/B/$/A/$","B/C/$","B/C/$/C/$/A/B/$"),stringsAsFactors=FALSE)

#split strings
tmp <- strsplit(data$path, "/$", fixed = TRUE) #thanks to David
data$tr <- lengths(tmp)

#paste them together cumulatively
tmp <- lapply(tmp, Reduce, f = paste0, accumulate = TRUE)

#create data.frame
tmp <- lapply(tmp, `length<-`, max(lengths(tmp)))
tmp <- setNames(as.data.frame(do.call(rbind, tmp), stringsAsFactors = FALSE), 
                paste0("tr_", seq_len(max(data$tr))))

data <- cbind(data, tmp)
#             path tr tr_1  tr_2      tr_3
#1   A/A/$/B/$/A/$  3  A/A A/A/B   A/A/B/A
#2           B/C/$  1  B/C  <NA>      <NA>
#3 B/C/$/C/$/A/B/$  3  B/C B/C/C B/C/C/A/B

If you must, you can replace NA values with empty strings in another lapply loop.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.