11

Possible Duplicate:
Setting smaller buffer size for sys.stdin?

I have a Python (2.4/2.7) script using fileinput to read from standard input or from files. It's easy to use, and works well except for one case:

tail -f log | filter.py

The problem is that my script buffers its input, whereas (at least in this case) I want to see its output right away. This seems to stem from the fact that fileinput uses readlines() to grab up to its bufsize worth of bytes before it does anything. I tried using a bufsize of 1 and it didn't seem to help (which was somewhat surprising).

I did find that I can write code like this which does not buffer:

while 1:
    line = sys.stdin.readline()
    if not line: break
    sys.stdout.write(line)

The problem with doing it this way is that I lose the fileinput functionality (namely that it automatically opens all the files passed to my program, or stdin if none, and it can even decompress input files automatically).

So how can I have the best of both? Ideally something where I don't need to explicitly manage my input file list (including decompression), and yet which doesn't delay input when used in a "streaming" way.

4
  • close the stdin filehandle and reopen it with buffering = 0 (i haven't tried it, so Im not going to post it as an answer) Commented May 17, 2011 at 16:11
  • 1
    stackoverflow.com/questions/3670323/… Commented May 17, 2011 at 16:25
  • You might be mischaracterizing the situation somewhat by saying fileinput uses readlines(). By default, readlines() doesn't return til it hits EOF, whereas 'for line in fileinput.input():' and 'for line in sys.stdin:' will eventually return something when they get enough characters buffered. You could be right that fileinput uses readlines() internally, though, if it passes a bufsize argument. Commented Feb 3, 2016 at 4:09
  • I just filed bug report bugs.python.org/issue26290 "fileinput and 'for line in sys.stdin' do strange mockery of input buffering" which includes the behavior you've observed. Summary: fileinput is broken in both 2.7 and 3.4, "for line in sys.stdin:" is broken in 2.7 but fixed in 3.4, readline works properly in both 2.7 and 3.4. Commented Feb 5, 2016 at 2:55

2 Answers 2

3

Try running python -u; man says that it will "force stdin, stdout and stderr to be totally unbuffered".

You can just alter the hashbang path at the first line of filter.py.

Sign up to request clarification or add additional context in comments.

4 Comments

Note that there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this option.
Yeah for the reason tMC stated, this doesn't work. I did try it though.
Then don't use line-based I/O. Use plain stdin.read().
readline() (singular) works just fine. It's only readlines() (plural) that does the buffering I don't want. I imagine raw read() would work too, but it's not necessary in this case.
0

Have you tried:

def hook_nobuf(filename, mode):
    return open(filename, mode, 0)

fi = fileinput.FileInput(openhook=hook_nobuf)

Not tested it, but from reading what openhook param does and what passing 0 to open for bufsize param, this should do the trick.

3 Comments

This has no effect. Again the problem seems to be that fileinput uses the readlines() method and buffers internally.
Well, I think that's your answer then. Either don't use fileinput, or starting with fileinput.py as a base, rewrite it to not buffer internally. Looking at the code, there doesn't seem to be any way to make it not do at least SOME buffering just by passing parameters to it.
I'm new to Python; it seems shocking that this use case is not well covered (it seems very natural to write text filters in Python after all, if it weren't for this).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.