python json issue with bytes type serializing

Question

I am following a tutorial to build a simple webscraper from a static website, but i get the following TypeError: TypeError(f'Object of type {o.class.name} ' TypeError: Object of type bytes is not JSON serializable

Here is my code thus far: from bs4 import BeautifulSoup import requests import json

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text.encode('utf-8'),
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text.encode('utf-8'),
        "content": tweet.find('p', attrs= {'class': 'content'}).text.encode('utf-8'),
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text.encode('utf-8'),
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text.encode('utf-8')
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

The only thing I can assume is wrong is that the article is using an earlier version of python, but the article is quite recent, so that should't be the case. The code is being executed and the json file is created, but the only data on there is "author:". Sorry if the answer is obvious to some of you, but I'm just starting to learn.

Here's the entire error log:

(tutorial-env) C:\Users\afaal\Desktop\python\webscraper>python webscraper.py

Traceback (most recent call last):
  File "webscraper.py", line 20, in <module>
    json.dump(tweetArr, outfile)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 179, in dump
    for chunk in iterable:
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 438, in _iterencode
    o = _default(o)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable

Please share the entire error message. Why all the .text.encode('utf-8') ? — AMC
– AMC, Commented Feb 26, 2020 at 20:07
@AMC Done. Just following a tutorial, please forward your question to Ethan Jarell from HackerNoon. ;) — John Doe
– John Doe, Commented Feb 26, 2020 at 21:18
@juanpa.arrivillaga And how exactly would I go about doing that? — John Doe
– John Doe, Commented Feb 26, 2020 at 21:19
@JohnDoe you need to keep the .text part. It requires a str object. Note bytes or whatever custom type your library is using. Honestly, you really need to do some basic research on JSON serialization in Python. This sort of cargo-cult programming is not an effective way to learn anything. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 27, 2020 at 22:12

John Doe · Accepted Answer · 2020-02-27 22:40:05Z

1

Ok, so it turns out I needed to remove everything after ".text" and also just google "Json serialization" (I only tried to google my specific TypeError and didn't get any conclusive information). The correct code would then be as follows, in case any amateur like myself is having the same problem:

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text,
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text,
        "content": tweet.find('p', attrs= {'class': 'content'}).text,
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text,
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

All credit to @juanpa.arrivillaga, thanks a lot for clearing this out completely!

answered Feb 27, 2020 at 22:40

John Doe

2911 silver badge12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JonSG Over a year ago

you might want open('twitterData.json', 'w', encoding="utf-8")

Collectives™ on Stack Overflow

python json issue with bytes type serializing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related