2

I have the following two programs written in Python

# cat.py
import sys

filename = sys.argv[1]

with open(filename, "rb") as f:
    while c := f.read(1024 * 1024):
        sys.stdout.buffer.write(c)

This program reads a file and outputs it as a binary to stdout.

The following program is meant to read the data and print it as bytes.

import sys
import io
if __name__ == '__main__':
    print(sys.stdin.buffer.read(io.DEFAULT_BUFFER_SIZE))

However I do not get the file contents in this case. If I run this under Linux I do get the exact contents however if I run this in windows I do not:

python cat.py .\inputs\input.bin | python main.py

Output on Windows (running under pwsh.exe):

0x3
0xc2
0xb7
0x55
0x12
0x20
0x66
0x67
0x50
0xc3
0x9e
0xc2
0xbd
0xd
0xa

Output on Linux (This is correct):

0x3
0xfa
0x55
0x12
0x20
0x66
0x67
0x50
0xe8
0xab

Any ideas why this may be the case? Is it newline endings or something like that?

Also, in cat.py if I write to a file rather than stdout I do get the correct contents written to the file.


Update:

Okay, I have narrowed it down to it being a powershell issue. If i run this in cmd.exe I do not have any issues, however, if I run it under powershell I do.

2
  • Are we talking Python3 or Python2? Commented Sep 25, 2020 at 14:19
  • Sorry, I didnt mention, I was using Python 3 Commented Sep 25, 2020 at 14:21

1 Answer 1

1

It is likely that there are different encodings set up for both command lines which can results in a different data streams.

Unfortunately, even if you read in from stdin as binary, it has to go through the commandline and there typically is a system-wide encoding setting that affects it.

There is an answer that should help resolving this issue.

Sign up to request clarification or add additional context in comments.

2 Comments

I did have a look at the answer, it does mention using sys.stdin.buffer and sys.stdout.buffer which I am doing.
It has to do with the system-wide encoding, even though both are using the same encoding, powershell corrupts it as it misinterprets the encoding from the python application. Pythons default of UTF-8 is not understood by powershell. rkeithhill.wordpress.com/2010/05/26/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.