8

I have a set of file names like:

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

and I would like to filter them according to the number after "-".

In python, for instance, I can use the keyparameter of the sorting function:

filelist <- ["filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt"]
sorted(filelist, key=lambda(x): int(x.split("-")[1].split(".")[0]))

> ["filec-1.txt", "fileb-2.txt", "filef-4.txt", "filed-5.txt", "filea-10.txt"]

In R, I am playing with strsplit and lapply with no luck so far.

Which is the way to do it in R?

Edit: File names can be many things and may include more numbers. The only fixed pattern is that the number I want to sort by is after the "-". Another (real) example:

c <- ("boards10017-51.mp4",  "boards10065-66.mp4",  "boards10071-81.mp4",
      "boards10185-91.mp4", "boards10212-63.mp4",  "boards1025-51.mp4",   
      "boards1026-71.mp4",   "boards10309-89.mp4", "boards10310-68.mp4",  
      "boards10384-50.mp4",  "boards10398-77.mp4",  "boards10419-119.mp4", 
      "boards10421-85.mp4",  "boards10444-87.mp4",  "boards10451-60.mp4",  
      "boards10461-81.mp4",  "boards10463-52.mp4",  "boards10538-83.mp4",  
      "boards10575-62.mp4",  "boards10577-249.mp4")"
3
  • Will there always just be one number? Can't you just extract the numbers and order by that? Commented May 30, 2015 at 5:46
  • No, sorry. There are more numbers. A real example is boards451-74. I'll edit. Commented May 30, 2015 at 5:57
  • OK. I've added an update. Commented May 30, 2015 at 6:07

2 Answers 2

10

I'm not sure of the actual complexity of your list of file names, but something like the following might be sufficient:

filelist[order(as.numeric(gsub("[^0-9]+", "", filelist)))]
# [1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

Considering your edit, you may want to change the gsub to something like:

gsub(".*-|\\..*", "", filelist)

Again, without a few more text cases, it's hard to say whether this is sufficient for your needs.


Example:

 x <- c("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4", 
     "boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",     
     "boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",     
     "boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",   
     "boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",    
     "boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",    
     "boards10575-62.mp4", "boards10577-249.mp4")  

x[order(as.numeric(gsub(".*-|\\..*", "", x)))]
##  [1] "boards10384-50.mp4"  "boards10017-51.mp4"  "boards1025-51.mp4"  
##  [4] "boards10463-52.mp4"  "boards10451-60.mp4"  "boards10575-62.mp4" 
##  [7] "boards10212-63.mp4"  "boards10065-66.mp4"  "boards10310-68.mp4" 
## [10] "boards1026-71.mp4"   "boards10398-77.mp4"  "boards10071-81.mp4" 
## [13] "boards10461-81.mp4"  "boards10538-83.mp4"  "boards10421-85.mp4" 
## [16] "boards10444-87.mp4"  "boards10309-89.mp4"  "boards10185-91.mp4" 
## [19] "boards10419-119.mp4" "boards10577-249.mp4" 
Sign up to request clarification or add additional context in comments.

1 Comment

Could also target the numbers directly: sub(".*-(\\d+).*", "\\1", x)
0

I made a regEx sort function:

function:

reg_sort <- function(x,...,verbose=F) {
    ellipsis <-   sapply(as.list(substitute(list(...)))[-1], deparse, simplify="array")
    reg_list <-   paste0(ellipsis, collapse=',')
    reg_list %<>% strsplit(",") %>% unlist %>% gsub("\\\\","\\",.,fixed=T)
    pattern  <-   reg_list %>% map_chr(~sub("^-\\\"","",.) %>% sub("\\\"$","",.) %>% sub("^\\\"","",.) %>% trimws)
    descInd  <-   reg_list %>% map_lgl(~grepl("^-\\\"",.)%>%as.logical)

    reg_extr <-   pattern %>% map(~str_extract(x,.)) %>% c(.,list(x)) %>% as.data.table
    reg_extr[] %<>% lapply(., function(x) type.convert(as.character(x), as.is = TRUE))

    map(rev(seq_along(pattern)),~{reg_extr<<-reg_extr[order(reg_extr[[.]],decreasing = descInd[.])]})

    if(verbose) { tmp<-lapply(reg_extr[,.SD,.SDcols=seq_along(pattern)],unique);names(tmp)<-pattern;tmp %>% print }

    return(reg_extr[[ncol(reg_extr)]])
}

data:

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

call function

reg_sort(filelist,"\\d+")
#[1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

other features are:

  • Sort descending: reg_sort(filelist,-"\\d+")

    #[1] "filea-10.txt" "filed-5.txt" "filef-4.txt" "fileb-2.txt" "filec-1.txt"

  • Multi layer sorting: reg_sort(filelist,-"\\d+","\\w") (does not make sense with this example data)

  • Verbose mode: reg_sort(filelist,"\\d+",verbose=T) (see/check what the regEx pattern has extracted in order to sort)

    $\\d+ [1] 1 2 4 5 10

    [1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.