How to set Multiple input files in python code

Question

I am using this code for searching a target_string in a single input file (input.txt) and "extracting" those lines with the target_string in an output file (output.txt). Now I want to perform the same procedure but with several input files, for instance, input1.txt, input2.txt, input3.txt, ...

How can I modify this code for doing this?

from collections import deque
input_file = 'input.txt' 
output_file = 'output11.txt' 
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines


with open(input_file) as f_in, open(output_file, "w") as f_out:
  # Un bucle for que itere por `f_in` recuperará una línea de cada vez
  for line in f_in:
    contexto.append(line)       
    if  len(contexto) < 4:      
      continue
    if buscado in contexto[1]:  
      f_out.writelines(contexto)

Does anyone has any suggestion? I've been struggling for hours :C

chepner · Accepted Answer · 2022-03-04 16:39:36Z

6

Consider using the fileinput module.

import fileinput
from collections import deque
output_file = 'output11.txt' 
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines


with open(output_file, "w") as f_out:
    for line in fileinput.input(files=["input1.txt", "input2.txt"]):
        contexto.append(line)       
        if len(contexto) < 4:      
            continue
        if buscado in contexto[1]:  
            f_out.writelines(contexto)

answered Mar 4, 2022 at 16:39

chepner

538k77 gold badges594 silver badges746 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jackal · Accepted Answer · 2022-03-04 16:42:05Z

3

Have you considered multithreading? You could do it like this:

from concurrent.futures import ThreadPoolExecutor

BUSCADO = 'TCGCCATCCGAATTCCA'

def process(fnum):
    with open(f'input{fnum}.txt') as infile:
        lines = infile.readlines()
        with open(f'output{fnum}.txt', 'w') as outfile:
            for line in lines[4:]:
                if BUSCADO in line:
                    outfile.write(line)

def main():
    with ThreadPoolExecutor() as executor:
        executor.map(process, range(1, 4))

if __name__ == '__main__':
    main()

answered Mar 4, 2022 at 16:42

jackal

29.1k3 gold badges9 silver badges27 bronze badges

3 Comments

C-3PO Over a year ago

Thanks, that's an interesting approach. How many threads are called by ThreadPoolExecutor()?

jackal Over a year ago

@C-3PO In this case it will be three

Sebastian Over a year ago

I can¡t understand what have you done hahah, I have never heard "multithreading"

C-3PO · Accepted Answer · 2022-03-04 16:32:32Z

From what I understood, you need to repeat the search procedure, but now scanning multiple input files. In that case, you can create a nested for-loop for the input files:

from collections import deque
all_input_files = ['input.txt'] # add new files here
output_file     = 'output11.txt' 
buscado         = 'TCGCCATCCGAATTCCA'

contexto        = deque([], 4)  # for keeping the last 4 lines

with open(output_file, "w") as f_out:
    for input_file in all_input_files:
        with open(input_file,"r") as f_in:
            # Un bucle for que itere por `f_in` recuperará una línea de cada vez
            for line in f_in:
                contexto.append(line)       
                if  len(contexto) < 4:      
                    continue
                if buscado in contexto[1]:  
                    f_out.writelines(contexto)

mnikley · Accepted Answer · 2022-03-04 16:45:05Z

1

For example:

from collections import deque
buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines

input_file_list = ["input1.txt", "input2.txt", "input3.txt"]

for input_file in input_file_list:
    output_file = input_file.replace("input", "output")
    with open(input_file) as f_in, open(output_file, "w") as f_out:
      # Un bucle for que itere por `f_in` recuperará una línea de cada vez
      for line in f_in:
        contexto.append(line)
        if  len(contexto) < 4:
          continue
        if buscado in contexto[1]:
          f_out.writelines(contexto)

Edit

This solution will create multiple output files depending on the name of the input-file names. After some discussion, you probably want to append the data to a single file, which requires a slightly different code similar to the other answers:

from collections import deque

buscado = 'TCGCCATCCGAATTCCA'

contexto = deque([], 4)  # for keeping the last 4 lines

input_file_list = ["input1.txt", "input2.txt", "input3.txt"]
output_file = "output.txt"

with open(output_file, "w") as f_out:
    for input_file in input_file_list:
        with open(input_file) as f_in:
            for line in f_in:
                contexto.append(line)
                if len(contexto) < 4:
                    continue
                if buscado in contexto[1]:
                    f_out.writelines(contexto)

edited Mar 4, 2022 at 16:45

answered Mar 4, 2022 at 16:30

mnikley

1,6631 gold badge11 silver badges23 bronze badges

4 Comments

C-3PO Over a year ago

This solution will re-initialize the output file every iteration; erasing the results previously stored. You need to place the open(output_file, "w") statement outside, or use the "a" flag instead of "w".

mnikley Over a year ago

@C-3PO i cannot follow - the line output_file = input_file.replace("input", "output") creates a new string depending on the name of the input file, so in each iteration over input_file_list you will have output file names like output1.txt, output2.txt and so on - without overwriting any previous file

C-3PO Over a year ago

Oh yeah, that's fine. I need a coffee :)

mnikley Over a year ago

@C-3PO giving it some thought - i see what you mean. It is unclear to me if the author wants one output file or separate ones.

Collectives™ on Stack Overflow

How to set Multiple input files in python code

4 Answers 4

Comments

3 Comments

Comments

Edit

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

Edit

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related