Json loading with stdin failure

Question

I am trying to load my json file with the stdin using the Windows command line: python algo.py < number.json and using json.loads(sys.stdin) in my script but it fails.

However, I can load my json with

with open('number.json',encoding='utf-8-sig') as f:
n = json.loads(f)

Exception raised when using json.loads(sys.stdin):

the JSON object must be str, bytes or bytearray, not TextIOWrapper

Exception raised when using json.load(sys.stdin) or json.loads(sys.stdin.read()):

Expecting value: line 1 column 1 (char 0)

Anyone encountered the same issue? I read multiple posts in this forum prior asking help.

Here is the json file:

[
  {
    "x": 1,
    "y": 4,
    "z": -1,
    "t": 2
  },
  {
    "x": 2,
    "y": -1,
    "z": 3,
    "t": 0
  }
]

What is the error message exactly? json.load(sys.stdin) works for me with a proper json file. — The Pjot
– The Pjot, Commented Jul 1, 2019 at 9:53
The exception raises that it should be a string, bytes but not a "TextIOWrapper" @KlausD. — eliaroseX
– eliaroseX, Commented Jul 1, 2019 at 9:53

rtoijala · Accepted Answer · 2019-07-01 12:50:59Z

Based on your comments, your problem seems to be that you have the UTF-8 BOM prepended to your file. That means that the extra three bytes 0xEF 0xBB 0xBF are found first in your file.

The Python json module documentation says that it does not accept a BOM. Therefore you must remove it before passing the JSON data to json.load or json.loads.

There are at least three ways to remove the BOM. The best is to simply edit your JSON file to remove it. If that is not possible, you can skip it in your Python code.

If only need your code to work with files that contain a BOM, you can use:

assert b'\xEF\xBB\xBF' == sys.stdin.buffer.read(3)

This makes sure that the removed bytes were really the UTF-8 BOM.

If you need to work with files that may or may not contain a BOM, you can wrap your standard input stream with a TextIOWrapper with the correct encoding, as mentioned in this answer. Then the code looks like this:

import io
stdin_wrapper = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8-sig')
# use stdin_wrapper instead of stdin

Quoting the Python Unicode HOWTO for why utf-8-sig:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec to automatically skip the mark if present.

Collectives™ on Stack Overflow

Json loading with stdin failure

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related