1

I am trying to load my json file with the stdin using the Windows command line: python algo.py < number.json and using json.loads(sys.stdin) in my script but it fails.

However, I can load my json with

with open('number.json',encoding='utf-8-sig') as f:
n = json.loads(f)

Exception raised when using json.loads(sys.stdin):

the JSON object must be str, bytes or bytearray, not TextIOWrapper

Exception raised when using json.load(sys.stdin) or json.loads(sys.stdin.read()):

Expecting value: line 1 column 1 (char 0)

Anyone encountered the same issue? I read multiple posts in this forum prior asking help.

Here is the json file:

[
  {
    "x": 1,
    "y": 4,
    "z": -1,
    "t": 2
  },
  {
    "x": 2,
    "y": -1,
    "z": 3,
    "t": 0
  }
]
13
  • json.load(sys.stdin) (without s) Commented Jul 1, 2019 at 9:45
  • both load and loads fails with the sys.stdin method. Commented Jul 1, 2019 at 9:48
  • What is the error message exactly? json.load(sys.stdin) works for me with a proper json file. Commented Jul 1, 2019 at 9:53
  • The exception raises that it should be a string, bytes but not a "TextIOWrapper" @KlausD. Commented Jul 1, 2019 at 9:53
  • json.load(open('number.json')) will definitely work! Commented Jul 1, 2019 at 9:57

1 Answer 1

1

Based on your comments, your problem seems to be that you have the UTF-8 BOM prepended to your file. That means that the extra three bytes 0xEF 0xBB 0xBF are found first in your file.

The Python json module documentation says that it does not accept a BOM. Therefore you must remove it before passing the JSON data to json.load or json.loads.

There are at least three ways to remove the BOM. The best is to simply edit your JSON file to remove it. If that is not possible, you can skip it in your Python code.

If only need your code to work with files that contain a BOM, you can use:

assert b'\xEF\xBB\xBF' == sys.stdin.buffer.read(3)

This makes sure that the removed bytes were really the UTF-8 BOM.

If you need to work with files that may or may not contain a BOM, you can wrap your standard input stream with a TextIOWrapper with the correct encoding, as mentioned in this answer. Then the code looks like this:

import io
stdin_wrapper = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8-sig')
# use stdin_wrapper instead of stdin

Quoting the Python Unicode HOWTO for why utf-8-sig:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec to automatically skip the mark if present.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.