4

How do I read sys.stdin, but ignoring decoding errors? I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore'), but I want to read sys.stdin line by line. Maybe I can somehow reopen the sys.stdin file but with errors='ignore' option?

7
  • 1
    what about putting the decode inside a try and handling the decoding errors as exceptions? Commented Aug 12, 2022 at 14:18
  • @SembeiNorimaki, how it can help? I need to do sys.stdin.read(), or more specifically for line in sys.stdin, but it throws a UnicodeDecodeError. If I catch it, how can I read the line anyway? I just need to ignore symbols it can't read. The line mostly contains ascii characters, but it can contain characters outside the ASCII, so I need to just ignore them or replace with '?' for example Commented Aug 12, 2022 at 14:44
  • If you cannot decode it you have to see why. give us an example of the input that is giving you the decode error. Maybe some inputs are encoded in another format, we need some examples to see how to solve it Commented Aug 12, 2022 at 14:53
  • @SembeiNorimaki, data doesn't matter, I want to expect any data, including pure binary data (even often it's text), I don't want to be able to decode all the data, I want to be able to ignore the data I can't decode using bytes.decode function. I could do it if I was reading actual file like open(filename, 'r', errors='ignore'), but I want to read sys.stdin instead, but it's already an opened file descriptor, so I don't know how to set the errors='ignore' option. Commented Aug 12, 2022 at 15:12
  • then you put a try and inside you decode the data and a except with a pass that will just ignore the data that fails to decode Commented Aug 12, 2022 at 16:10

2 Answers 2

1

Found three solutions from here as Mark Setchell mentioned.

import sys
import io

def first():
    with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
        return f.read()

def second():
    sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
    return sys.stdin.read()

def third():
    sys.stdin.reconfigure(errors='ignore')
    return sys.stdin.read()


print(first())
#print(second())
#print(third())

Usage:

$ echo 'a\x80b' | python solution.py
ab
Sign up to request clarification or add additional context in comments.

Comments

0

You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin and sys,stdout (sys.stderr will always use "backslashreplace"). PYTHONIOENCODING accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.

$  cat so73335410.py
import sys

if __name__ == '__main__':
    data = sys.stdin.read()
    print(data)
$
$  echo hello | python so73335410.py
hello

$  echo hello hello hello hello | zip > hello.zip
  adding: - (deflated 54%)
$
$  cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
  -▒
UY  HW@'PKv>

  ▒-PK,-/>PKmPK/>
$ 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.