2

I want to read some data from a JSON url, however, with my code i don't get a JSON structure, instead i get a string undecoded.

I've also tried reading the data directly from the url using pandas read_json, but i get this error message: UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 136430: character maps to <undefined>.

This is the main code i'm using:

from urllib.request import urlopen
import json

response = urlopen("https://data.nasa.gov/resource/y77d-th95.json")
json_data = response.read().decode('utf8','replace')

And this is what i get in json_data:

'\x1f�\x08\x00\x00\x00\x00\x00\x00\x00̽K��ȕ&��_ልR����\n
i�MOfd*3+#��RI5]�F��:�:��%Ct�\x1b�ѨE�\x06�3jP��l\x06�\x02\n�A�\n
\x02��ƌ��<f4�c��Ũ^�P����^�����__u�S��o^}Y���\x15y�n��Gտ������\x0f�T��
\x1f�WCs\x7f��\x0f\x07�Gor��?��=�\x7f�m�߫\x7f��F�\x1f�ꥩ\x07�G�l�Q��\x7f�e
\x7f3��&˲��}}T\x7f%�6e�O\x7f�w\x0f�O�M&9��\x0f\x1f�~���Ƕ�^��
\x7f}u�I?�mwT��}�\x0f۶����%��"F�J��?���B\x00�aw:\x18�\x0cE�]1!,Yv\x13b�S
\x0cb��\x06�\x04�f\x1b�\x130\x1a9b�:0�ƀ,P��|\'&�4+Ͽ�\x16P�\x01\x15
\x1bF�����)��o���\x12�Z�\x18�\x0e����i\x7f�_wl�b5"��\x01�+*n#.
\x0b\x041-6r����TI�� 1J]Ļv��\x1b��8�7p\x0b���f�ʮ9ߨE�-m!6U�\x02t\x14$W�
\x0e���]��\x1f\'�ղ�,\x18�n��\x15���\r5�7�-�F�,��#Z�\x0b�î]\x7f�?l��
\x17�c�5���"��\r_��h`y\x05\x06X���m\\�Ӛ/\x01l�Q�\x00\x7f�\x1e\x1a^E\\���Y
膒T�@=7T�)h\x02\u038b\x18\x11�\x1b��To�\t�\\t\\i\x11zr8���9�\x14�-
�.�G�*HF�3���^}����+`AK\x1c0�+D\x00/��f\x1b鿟���)���E�
\x03��Oݪٯ�<\x0e��p��յ�b�����Wo���\x13|zC\x0fo�mk�b�\r�\x15�3��ˌ���Y�
\x18�\x0e����0!\x16��5\x13A�,ǀV�a8\x01�\x1b�b-^ĈQ�.�Ь\x0f�a���o^�
\x1a�(�?v�]�����,�\x0bY\x19��|;�\rO9�\x171b�:8\x1f���2�f���2�\u03a2�
\x0eS\x1bm~\x1b�|����Q�\x18�.Ȼzxw\x02�\x16��,�X��\x1d.q��\x1chY�\x19O�
\x1c1:]����aZO\x1e������̱���j�RnDʦ����N\x17���~�P����Cߑ��\x7f�rB�
\x07��Jb~\x1d�|x\x05��\x14k��\x11�ց�_��n�kH�\x07���ٺ�M\x11Z�\x12��b#<�
\x15\x1c�E��"G��\x19�\x7f���q7�0\'`P����,˰�U\x1ekQd�;��f,\x12\x0e�(G�N
\x17�������{:�,��p����r��\x1d�\x16|��\x11��ũn��xu�E���7�y�\x06\\�,
\x04�,"�\x16��$�t��l`#GF�3��p�.�<�H\x02k:Z\x18�o�\x14j\x01\x1b�M��{�#l
\x06�e�l����ǿ\x0eu7!���7��A��OO��\x11,n�\x02�\x15�O�|#
\x19u�k��(O�"j1b��ط���p���""\\\x1ay5)[\x023�s�)=!\'��Blg9bԺ`�w�3[��\x04�
\x0e9\x0b!-9��\x16��\n^�rCS\x0c\n#G��\x19�á�C;��\x02��`p���m�
.
.
.

Any idea what i'm doing wrong?

2
  • When I run that, json_data appears to be a string containing a JSON object. (Though the newline at the end needs to be stripped.) What do you get if you add json.loads(json_data.rstrip()) after that? Commented Jul 11, 2019 at 18:04
  • Adding your suggestion, I get JSONDecodeError: Expecting value: line 1 column 1 (char 0) Commented Jul 12, 2019 at 14:13

1 Answer 1

1

trying to use json.loads directly on the result of read with default decode gives me a valid list,

try this please:

from urllib.request import urlopen
import json

response = urlopen("https://data.nasa.gov/resource/y77d-th95.json")
json_data = json.loads(response.read().decode())

print(json_data)
Sign up to request clarification or add additional context in comments.

1 Comment

Tried your suggestion, got UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.