2

I am using this code for searching a target_string in a single input file (input.txt) and "extracting" those lines with the target_string in an output file (output.txt). Now I want to perform the same procedure but with several input files, for instance, input1.txt, input2.txt, input3.txt, ...

How can I modify this code for doing this?

from collections import deque
input_file = 'input.txt' 
output_file = 'output11.txt' 
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines


with open(input_file) as f_in, open(output_file, "w") as f_out:
  # Un bucle for que itere por `f_in` recuperará una línea de cada vez
  for line in f_in:
    contexto.append(line)       
    if  len(contexto) < 4:      
      continue
    if buscado in contexto[1]:  
      f_out.writelines(contexto) 

Does anyone has any suggestion? I've been struggling for hours :C

4 Answers 4

6

Consider using the fileinput module.

import fileinput
from collections import deque
output_file = 'output11.txt' 
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines


with open(output_file, "w") as f_out:
    for line in fileinput.input(files=["input1.txt", "input2.txt"]):
        contexto.append(line)       
        if len(contexto) < 4:      
            continue
        if buscado in contexto[1]:  
            f_out.writelines(contexto) 
Sign up to request clarification or add additional context in comments.

Comments

3

Have you considered multithreading? You could do it like this:

from concurrent.futures import ThreadPoolExecutor

BUSCADO = 'TCGCCATCCGAATTCCA'

def process(fnum):
    with open(f'input{fnum}.txt') as infile:
        lines = infile.readlines()
        with open(f'output{fnum}.txt', 'w') as outfile:
            for line in lines[4:]:
                if BUSCADO in line:
                    outfile.write(line)

def main():
    with ThreadPoolExecutor() as executor:
        executor.map(process, range(1, 4))

if __name__ == '__main__':
    main()

3 Comments

Thanks, that's an interesting approach. How many threads are called by ThreadPoolExecutor()?
@C-3PO In this case it will be three
I can¡t understand what have you done hahah, I have never heard "multithreading"
1

From what I understood, you need to repeat the search procedure, but now scanning multiple input files. In that case, you can create a nested for-loop for the input files:

from collections import deque
all_input_files = ['input.txt'] # add new files here
output_file     = 'output11.txt' 
buscado         = 'TCGCCATCCGAATTCCA'

contexto        = deque([], 4)  # for keeping the last 4 lines

with open(output_file, "w") as f_out:
    for input_file in all_input_files:
        with open(input_file,"r") as f_in:
            # Un bucle for que itere por `f_in` recuperará una línea de cada vez
            for line in f_in:
                contexto.append(line)       
                if  len(contexto) < 4:      
                    continue
                if buscado in contexto[1]:  
                    f_out.writelines(contexto) 

Comments

1

For example:

from collections import deque
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines

input_file_list = ["input1.txt", "input2.txt", "input3.txt"]

for input_file in input_file_list:
    output_file = input_file.replace("input", "output")
    with open(input_file) as f_in, open(output_file, "w") as f_out:
      # Un bucle for que itere por `f_in` recuperará una línea de cada vez
      for line in f_in:
        contexto.append(line)
        if  len(contexto) < 4:
          continue
        if buscado in contexto[1]:
          f_out.writelines(contexto)

Edit

This solution will create multiple output files depending on the name of the input-file names. After some discussion, you probably want to append the data to a single file, which requires a slightly different code similar to the other answers:

from collections import deque

buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines

input_file_list = ["input1.txt", "input2.txt", "input3.txt"]
output_file = "output.txt"

with open(output_file, "w") as f_out:
    for input_file in input_file_list:
        with open(input_file) as f_in:
            for line in f_in:
                contexto.append(line)
                if len(contexto) < 4:
                    continue
                if buscado in contexto[1]:
                    f_out.writelines(contexto)

4 Comments

This solution will re-initialize the output file every iteration; erasing the results previously stored. You need to place the open(output_file, "w") statement outside, or use the "a" flag instead of "w".
@C-3PO i cannot follow - the line output_file = input_file.replace("input", "output") creates a new string depending on the name of the input file, so in each iteration over input_file_list you will have output file names like output1.txt, output2.txt and so on - without overwriting any previous file
Oh yeah, that's fine. I need a coffee :)
@C-3PO giving it some thought - i see what you mean. It is unclear to me if the author wants one output file or separate ones.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.