For loop within a for loop for iterating files of different extensions

Question

Say I have 20 different files. First 10 files end with .counts.tsv and the rest of the files end with .libsize.tsv. For each .counts.tsv there are matching .libsize.tsv files. I would like to use a for loop for selecting both of these files and run an R script for on those two files types. Here is what I tried,

#!/bin/bash
arti='/home/path/tofiles'
for counts in ${arti}/*__counts.tsv ; do
    for libsize in "$arti"/*__libsize.tsv ; do
        Rscript score.R  ${counts} ${libsize}
 done;
done;

The above shell script iterates over the files more than 200 times whereas I have only 20 files. I need the Rscript to be executed 10 times for both files. Any suggestions would be appreciated.

In the end, I need to execute the R script on each counts and libsize — ARJ
– ARJ, Commented Jun 13, 2019 at 14:41
"10 times for both files" so 20 iterations total? Hopefully the files are named with similar first parts, ie do you have myFile.libsize.tsv and myFile.__counts.tsv Then you only need 1 loop, strip out the extension from the variable returned by the loop and add it back in to 2 copies on your Rscript line, ie. Rscript ${myF}.__counts.tsv ${myF}.__libsize.tsv. Good luck. — shellter
– shellter, Commented Jun 13, 2019 at 14:53
The Rscript should only run 10 times. Hence, 10 iterations. So I think I need to be more clear here, for every .count.tsv file there is a matching .libsize.tsv is present therefore in total 20. Therefore, at the end the Rscript should only iterate 10 times — ARJ
– ARJ, Commented Jun 13, 2019 at 14:55

Matt Summersgill · Accepted Answer · 2019-06-13 16:17:55Z

I started typing up an answer before seeing your comment that you're only interested in a bash solution, posting anyway in case someone finds this question in the future and is open to an R based solution.

If I were approaching this from scratch, I'd probably just use an R function defined in the file that takes the two file names instead of messing around with the system() calls, but this would provide the behavior you desire.

## Get a vector of files matching each extension
counts_names <- list.files(path = ".", pattern ="*.counts.tsv")
libsize_names <- list.files(path = ".", pattern ="*.libsize.tsv")

## Get the root names of the files before the extensions
counts_roots <- gsub(".counts.tsv$", "",counts_names)
libsize_roots <- gsub(".libsize.tsv$", "",libsize_names)

## Get only root names that have both file types
shared_roots <- intersect(libsize_roots,counts_roots)

## Loop through the shared root names and execute an Rscript call based on the two files
for(i in seq_along(shared_roots)){

  counts_filename <- paste0(shared_roots[[i]],".counts.tsv")
  libsize_filename <- paste0(shared_roots[[i]],".libsize.tsv")

  Command  <- paste("Rscript score.R",counts_filename,libsize_filename)
  system(Command)

}

Walter A · Accepted Answer · 2019-06-13 22:04:25Z

3

Construct the second filename with ${counts%counts.tsv} (remove last part).

#!/bin/bash
arti='/home/path/tofiles'
for counts in ${arti}/*__counts.tsv ; do
    libsize="${counts%counts.tsv}libsize.tsv"
    Rscript score.R "${counts}" "${libsize}"
done

EDIT:
Less safe is trying to make it an oneliner. When the filenames are without spaces and newlines, you can risk an accident with

echo ${arti}/*counts.tsv ${arti}/*.libsize.tsv | xargs -n2 Rscript score.R

and when you feel really lucky (with no other files than those tsv files in $arti) make a bungee jump with

echo ${arti}/* | xargs -n2 Rscript score.R

edited Jun 13, 2019 at 22:04

answered Jun 13, 2019 at 18:54

Walter A

20.2k2 gold badges29 silver badges46 bronze badges

2 Comments

ARJ Over a year ago

Thanks, I have another solution posted below :)

Walter A Over a year ago

Your solution is the same idea, using both basename and awk is slower. In this case the performance won't matter, it will be important when you want to loop through large files and do something for each line.

bjorn2bewild · Accepted Answer · 2019-06-13 14:51:51Z

1

Have you tried list.files in base? This will allow you to use all files in the folder.

arti='/home/path/tofiles'
for i in list.files(arti) {
  script
}

answered Jun 13, 2019 at 14:51

bjorn2bewild

1,0091 gold badge10 silver badges27 bronze badges

3 Comments

ARJ Over a year ago

The files I need are of two different extensions. Say I have file that ends with counts.tsvand libsize.tsv these files needed to be selected separately for the Rscript. Hence, your solution won't work.

Aaron - mostly inactive Over a year ago

@user1017373: This is almost certainly going to be the right tool to use, though. Perhaps you'll need to separate the list somehow after you get it? Please clarify the question, it's not clear how with 10 files of each type, you want the script to run only 10 times. There's something you're not telling us...

ARJ Over a year ago

@Aaron, Thanks for the comment. Yes, for instance, I have 10 samples with counts.tsv files and a matching libsize.tsv file. Therefore, at the end I need only 10 ierations, however in the folder i have 20 files

Theo · Accepted Answer · 2019-06-13 16:58:07Z

1

See whether the below helps.

my_list = list.files("./Data")
counts = grep("counts.tsv", my_list, value=T)
libsize = grep("libsize.tsv", my_list, value=T)

for (i in seq(length(counts))){
  system(paste("Rscript score.R",counts[i],libsize[i]))
}

edited Jun 13, 2019 at 16:58

answered Jun 13, 2019 at 16:16

Theo

5753 silver badges8 bronze badges

2 Comments

Aaron - mostly inactive Over a year ago

This seems like a mix of bash and R and so wouldn't actually run; am I missing something?

Theo Over a year ago

The idea was to bring both the files simultaneously inside the for loop. Editing the answer.

ARJ · Accepted Answer · 2019-06-14 08:42:35Z

0

Finally,

I tried the following and it helped me,

for sam in "$arti"/*__counts.tsv ; do
      filebase=$(basename $sam)
      samples=$(ls -1 ${filebase}|awk -F'[-1]' '{print $1}')
        Rscript score.R ${samples}__counts.tsv ${samples}__libsize.tsv
 done;

For someone looking for something similar :)

answered Jun 14, 2019 at 8:42

ARJ

2,1005 gold badges27 silver badges57 bronze badges

Collectives™ on Stack Overflow

For loop within a for loop for iterating files of different extensions

5 Answers 5

Comments

2 Comments

3 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related