62

I have a Python script that process a huge text file (with around 4 millon lines) and writes the data into two separate files.

I have added a print statement, which outputs a string for every line for debugging. I want to know how bad it could be from the performance perspective?

If it is going to very bad, I can remove the debugging line.

Edit

It turns out that having a print statement for every line in a file with 4 million lines is increasing the time way too much.

3
  • 4
    timeit docs.python.org/2/library/timeit.html Commented Nov 8, 2012 at 11:29
  • It will be slower as you are having to perform a large number of prints, any extra processing is going to incur some performance penalty. Commented Nov 8, 2012 at 12:09
  • 1
    Send item to a socket queue : the program will finish the writes first, and the console from the socket will print the output with a lag. Commented Jul 7, 2020 at 14:04

2 Answers 2

75

Tried doing it in a very simple script just for fun, the difference is quite staggering:

In large.py:

target =  open('target.txt', 'w')

for item in xrange(4000000):
    target.write(str(item)+'\n')
    print item

Timing it:

[gp@imdev1 /tmp]$ time python large.py
real    1m51.690s
user    0m10.531s
sys     0m6.129s

gp@imdev1 /tmp]$ ls -lah target.txt 
-rw-rw-r--. 1 gp gp 30M Nov  8 16:06 target.txt

Now running the same with "print" commented out:

gp@imdev1 /tmp]$ time python large.py 
real    0m2.584s
user    0m2.536s
sys     0m0.040s
Sign up to request clarification or add additional context in comments.

6 Comments

And when you comment out the write, leave in the print, and run with > target.txt ?
@Tim: Oddly enough it worked faster, but it could be my machine is less busier than it was when I ran it earlier, don't have time right now to run it many times to use more sound statistical approach. [gp@imdev1 /tmp]$ time python large.py > target.txt real 0m1.954s user 0m1.897s sys 0m0.049s
redirecting stdout to a file will be much faster, in fact you can direct to a file and open the file in an editor in less time than it takes to spew a large amount of io to the screen.
@GSP Thanks. It looks like, I should remove the print statements.
I was wondering as well if having a verbose option that checks a bool as to whether or not to print hurts time much. Using if False: print item I had it run in 1.417s and without any print it ran in 1.357s.
|
10

Yes it affects performance. I wrote a small program to demonstrate-

import time
start_time=time.time()
for i in range(100):
    for j in range(100):
        for k in range(100):
            print(i,j,k)
print(time.time()-start_time)
input()

The time measured was-160.2812204496765 Then I replaced the print statement by pass. The results were shocking. The measured time without print was- 0.26517701148986816.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.