0

I have a python script as follow:

#!/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa" # Input fasta file
wanted_file = "A_ids.txt" # Input interesting sequence IDs, one per line
result_file = "A.fasta" # Output fasta file

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

I would like to run the script above for the same input file, but for 40 different wanted_files - with different names - A_ids.txt, B_ids.txt, etc. And I would like to have their respective different outputs - A.fasta, B.fasta, etc.

Do I need to change my python script or I need to create a loop to run it for all my wanted files?

thanks

1
  • 2
    You can certainly make a list of all your input files and the corresponding outputs and loop over it. If this is a process you expect to repeat, it might be worthwhile to change your script to read the filenames in from a file and then write a function to construct output filenames from that. Commented Nov 11, 2016 at 17:04

3 Answers 3

4

I agree with @BlackVegetable. Set it to use command line arguments, by doing something like this:

#!/usr/bin/python
from Bio import SeqIO

import sys # for sys.argv

fasta_file = sys.argv[1] # This is now going to be name.fa, the fasta file
wanted_file = sys.argv[2] # This is now going to be name_ids.txt, or whatever you passed
# as an argument
result_file = sys.argv[3] # Output fasta file, now passed as arg

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

Then you could call the program with python input.fa A_ids.txt A.fasta, in your case. Or, python inputB.fa B_ids.txt B.fasta.

Sign up to request clarification or add additional context in comments.

4 Comments

This is more lightweight than argparse, and probably more appropriate for the OP. Also, you gave a helpful, concrete application. +1
thank you @BlackVegetable my apologies for stealing the glory XD
but in this case I need to run the command 40 times...thanks!
See stackoverflow.com/a/10523501/758446 for how to loop over all files in a directory with this command.
0

Consider having this program taking commandline options. This would allow you to read the wanted_file name from the commandline as an argument and you could deduce the appropriate output file name by parsing the given argument and following a pattern (such as replace extension given with .fasta) or having the output pattern be another command line argument of some sort.

You could call your program as python my_script.py A_ids.txt and loop over that via bash. You could also choose to allow for a variable number of arguments, each of which would invoke your logic for the given name.

My preference for dealing with command line arguments is https://docs.python.org/3.3/library/argparse.html and https://docs.python.org/2/library/argparse.html depending on your python version.

(Additionally, if you take the path of using a single command line argument for the wanted_file, you could simply output the contents to stdout via print or similar functions and use a redirection operator in the command line to send the output to a filename provided there: python my_script.py A_ids.txt > A.fasta)

Comments

0

I think simpler way could be storing the 40 filenames in a file (in the code: wanted_filenames_file), store them in an array (wanted_files) and loop along each one of the files :

# !/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa"  # Input fasta file
wanted_filenames_file = "filenames.txt"
with open(wanted_filenames_file) as f:
    wanted_files = f.readlines()
result_file = []  # Output fasta file
for wanted_file in wanted_files:
    wanted = set()
    with open(wanted_file) as f:
        for line in f:
            line = line.strip()
            if line != "":
                wanted.add(line)

    fasta_sequences = SeqIO.parse(open(fasta_file), 'fasta')
    result_file = wanted_file.replace("_ids.txt", ".fasta")
    with open(result_file, "w") as f:
        for seq in fasta_sequences:
            if seq.id in wanted:
                SeqIO.write([seq], f, "fasta")

4 Comments

Did not work. I create a file with all file names that I want (filenames.txt) for the wanted files. I got some errors - from: can't read /var/mail/Bio ./modified.py: line 4: fasta_file: command not found ./modified.py: line 5: wanted_filenames_file: command not found ./modified.py: line 6: syntax error near unexpected token (' ./modified.py: line 6: with open(wanted_filenames_file) as f:'
@Clarissa are three errors raised at the same time?. I think i don't explained it clearly. You have to create a file called filenames.txt, where you should store all the filenames, one per line.
Yes I did it. I created a file called filenames.txt with all filenames inside, one per line.
But the 3 errors are raised at the same time?. ¿Could you share the stacktrace error in pastebin?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.