1

I have a folder with a mixture of files types (.bam, .bam.bai, and .log). I created a for loop to perform two commands on each of the .bam files. My current code direct the output of each command into a separate csv files, because I could not figure out how to direct the outputs to separate columns.

TYIA!

Question 1
I want to export the output from the commands into the same csv. How can I alter my code so that the output from my first command is saved as the first column of a csv, and the output from my second command is saved as the second column of the same csv.

Question 2
What is the name of the syntax used to select files in a for loop? For instance, the * in *.bam represents a wildcard. Is this regex? I had a tough time trying to alter this so that only *.bam files were selected for the for loop (and .bam.bai were excluded). I ended up with *[.bam] by guessing and empirically testing my outputs. Are there any websites that do a good job of explaining this syntax and provide lots of examples (coder level: newbie)

Current Code

> ~/Desktop/Sample_Names.csv
> ~/Desktop/Read_Counts.csv

echo "Sample" | cat - > ~/Desktop/Sample_Names.csv
echo "Total_Reads" | cat - > ~/Desktop/Read_Counts.csv

for file in *[.bam]
do
  samtools view -c $file >> ~/Desktop/Read_Counts.csv
  gawk -v RS="^$" '{print FILENAME}' $file >> ~/Desktop/Sample_Names.csv
done

Current Outputs (truncated)

>Sample_Names.csv
| Sample       |
|--------------|
| B40-JV01.bam |
| B40-JV02.bam |
| B40-JV03.bam |

>Read_Counts.csv
| Total_Reads |
|-------------|
| 3835555     |
| 4110463     |
| 144558      |

Desired Output

>Combined_Outputs.csv
| Sample       | Total_Reads |
|--------------|-------------|
| B40-JV01.bam | 3835555     |
| B40-JV02.bam | 4110463     |
| B40-JV03.bam | 144558      |
2
  • 2
    They're called globs or wildcard patterns. See mywiki.wooledge.org/BashGuide/Patterns Commented Jun 16, 2022 at 13:27
  • gawk -v RS="^$" '{print FILENAME}' $file = echo $file for non-empty files. Not sure what you were going for with that gawk command. Commented Jun 16, 2022 at 19:34

1 Answer 1

4

Something like

echo "Sample,Total_Reads" > Combined_Outputs.csv
for file in *.bam; do
    printf "%s,%s\n" "$file" "$(samtools view -c "$file")"
done >> Combined_Outputs.csv

Print one line for each file, and move the output redirection outside of the loop for efficiency.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, this works! Can't say I understand the syntax, though. Is the semicolon in *.bam; part of the glob syntax? Is %s also part of glob? Thanks for your help!
@JVGen The semicolon is part of the for loop syntax. And the %s's are printf format specifiers. Nothing to do with globs.
How does "*.bam" specify ".bam" while excluding ".bam.bai" files? I thought using *.bam alone would give me issues, but it seems to work!
@JVGen you seem to be confusing regexps with globbing patterns, see the reference Shawn provided earlier.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.