3

I am sniffing network packets using Tshark (Command-Line wireshark) and writing them to a file just as I receive them. My code block is similar to following:

documents = PriorityQueue(maxsize=0)
writing_enabled = True
with open("output.txt", 'w') as opened_file:
    while writing_enabled:
        try:
            data = documents.get(timeout=1)
        except Exception as e:
            #No document pushed by producer thread
            continue
        opened_file.write(json.dumps(data) + "\n")

If I receive files from Tshark thread, I put them into queue then another thread writes it to a file using the code above. However, after file reaches 600+ MB process slows down and then change status into Not Responding. After a research I think that this is because of default buffering mechanism of open file method. Is it reasonable to change the with open("output.txt", 'w') as opened_file: into with open("output.txt", 'w', 1000) as opened_file: to use 1000 byte of buffer in writing mode? Or there is another way to overcome this?

6
  • @Alderven Actually I am opening the file once. So open is called once hence I dont think I overrite it each time. Commented Sep 24, 2019 at 11:54
  • Have you tried to flush the buffer? tutorialspoint.com/python/file_flush.htm Commented Sep 24, 2019 at 11:55
  • 1
    Totally unrelated, but you definitly want to either restrict your except clause to the exact exception(s) you expect here or at least log the exceptions you catch - else if something unexpected happens you will never know. Commented Sep 24, 2019 at 12:06
  • How the writing_enabled flag changes its value? Is there any thread to change its value? Commented Sep 24, 2019 at 12:15
  • @brunodesthuilliers Thank you I will take care of that ^^ Commented Sep 24, 2019 at 12:16

1 Answer 1

3

For writing the internal buffer to the file you can use the files flush function. However, this should generally be handled by your operating system which has a default buffer size. You can use something like this to open your file if you want to specify your own buffer size:

f = open('file.txt', 'w', buffering=bufsize)

Please also see the following question: How often does Python flush to file

Alternatively to flushing the buffer you could also try to use rolling files, i.e. open a new file if the size of your currently opened file exceeds a certain size. This is generally good practice if you intend to write a lot of data.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer, its not related but after hours of debugging it turns out the reason it changes status to not responding is overload of a GUI component. However I will try your solution with timers too obtain performance differences.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.