How to avoid Python fileinput buffering [duplicate]

Question

Possible Duplicate:
Setting smaller buffer size for sys.stdin?

I have a Python (2.4/2.7) script using fileinput to read from standard input or from files. It's easy to use, and works well except for one case:

tail -f log | filter.py

The problem is that my script buffers its input, whereas (at least in this case) I want to see its output right away. This seems to stem from the fact that fileinput uses readlines() to grab up to its bufsize worth of bytes before it does anything. I tried using a bufsize of 1 and it didn't seem to help (which was somewhat surprising).

I did find that I can write code like this which does not buffer:

while 1:
    line = sys.stdin.readline()
    if not line: break
    sys.stdout.write(line)

The problem with doing it this way is that I lose the fileinput functionality (namely that it automatically opens all the files passed to my program, or stdin if none, and it can even decompress input files automatically).

So how can I have the best of both? Ideally something where I don't need to explicitly manage my input file list (including decompression), and yet which doesn't delay input when used in a "streaming" way.

close the stdin filehandle and reopen it with buffering = 0 (i haven't tried it, so Im not going to post it as an answer) — tMC
– tMC, Commented May 17, 2011 at 16:11
You might be mischaracterizing the situation somewhat by saying fileinput uses readlines(). By default, readlines() doesn't return til it hits EOF, whereas 'for line in fileinput.input():' and 'for line in sys.stdin:' will eventually return something when they get enough characters buffered. You could be right that fileinput uses readlines() internally, though, if it passes a bufsize argument. — Don Hatch
– Don Hatch, Commented Feb 3, 2016 at 4:09
I just filed bug report bugs.python.org/issue26290 "fileinput and 'for line in sys.stdin' do strange mockery of input buffering" which includes the behavior you've observed. Summary: fileinput is broken in both 2.7 and 3.4, "for line in sys.stdin:" is broken in 2.7 but fixed in 3.4, readline works properly in both 2.7 and 3.4. — Don Hatch
– Don Hatch, Commented Feb 5, 2016 at 2:55

9000 · Accepted Answer · 2011-05-17 16:18:53Z

3

Try running python -u; man says that it will "force stdin, stdout and stderr to be totally unbuffered".

You can just alter the hashbang path at the first line of filter.py.

answered May 17, 2011 at 16:18

9000

41k9 gold badges69 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tMC Over a year ago

Note that there is internal buffering in xreadlines(), readlines()  and  file-object  iterators  ("for  line  in sys.stdin") which is not influenced by this option.

John Zwinck Over a year ago

Yeah for the reason tMC stated, this doesn't work. I did try it though.

9000 Over a year ago

Then don't use line-based I/O. Use plain stdin.read().

John Zwinck Over a year ago

readline() (singular) works just fine. It's only readlines() (plural) that does the buffering I don't want. I imagine raw read() would work too, but it's not necessary in this case.

John Gaines Jr. · Accepted Answer · 2011-05-17 16:14:09Z

0

Have you tried:

def hook_nobuf(filename, mode):
    return open(filename, mode, 0)

fi = fileinput.FileInput(openhook=hook_nobuf)

Not tested it, but from reading what openhook param does and what passing 0 to open for bufsize param, this should do the trick.

answered May 17, 2011 at 16:14

John Gaines Jr.

11.6k1 gold badge28 silver badges25 bronze badges

3 Comments

John Zwinck Over a year ago

This has no effect. Again the problem seems to be that fileinput uses the readlines() method and buffers internally.

John Gaines Jr. Over a year ago

Well, I think that's your answer then. Either don't use fileinput, or starting with fileinput.py as a base, rewrite it to not buffer internally. Looking at the code, there doesn't seem to be any way to make it not do at least SOME buffering just by passing parameters to it.

John Zwinck Over a year ago

I'm new to Python; it seems shocking that this use case is not well covered (it seems very natural to write text filters in Python after all, if it weren't for this).

Collectives™ on Stack Overflow

How to avoid Python fileinput buffering [duplicate]

2 Answers 2

4 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Linked

Related