How do I read sys.stdin, but ignoring decoding errors?
I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore'), but I want to read sys.stdin line by line.
Maybe I can somehow reopen the sys.stdin file but with errors='ignore' option?
2 Answers
Found three solutions from here as Mark Setchell mentioned.
import sys
import io
def first():
with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
return f.read()
def second():
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
return sys.stdin.read()
def third():
sys.stdin.reconfigure(errors='ignore')
return sys.stdin.read()
print(first())
#print(second())
#print(third())
Usage:
$ echo 'a\x80b' | python solution.py
ab
Comments
You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin and sys,stdout (sys.stderr will always use "backslashreplace"). PYTHONIOENCODING accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.
$ cat so73335410.py
import sys
if __name__ == '__main__':
data = sys.stdin.read()
print(data)
$
$ echo hello | python so73335410.py
hello
$ echo hello hello hello hello | zip > hello.zip
adding: - (deflated 54%)
$
$ cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
-▒
UY HW@'PKv>
▒-PK,-/>PKmPK/>
$
sys.stdin.read(), or more specificallyfor line in sys.stdin, but it throws a UnicodeDecodeError. If I catch it, how can I read the line anyway? I just need to ignore symbols it can't read. The line mostly contains ascii characters, but it can contain characters outside the ASCII, so I need to just ignore them or replace with '?' for examplebytes.decodefunction. I could do it if I was reading actual file likeopen(filename, 'r', errors='ignore'), but I want to readsys.stdininstead, but it's already an opened file descriptor, so I don't know how to set theerrors='ignore'option.tryand inside you decode the data and aexceptwith apassthat will just ignore the data that fails to decode