2

I am following a tutorial to build a simple webscraper from a static website, but i get the following TypeError: TypeError(f'Object of type {o.class.name} ' TypeError: Object of type bytes is not JSON serializable

Here is my code thus far: from bs4 import BeautifulSoup import requests import json

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text.encode('utf-8'),
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text.encode('utf-8'),
        "content": tweet.find('p', attrs= {'class': 'content'}).text.encode('utf-8'),
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text.encode('utf-8'),
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text.encode('utf-8')
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

The only thing I can assume is wrong is that the article is using an earlier version of python, but the article is quite recent, so that should't be the case. The code is being executed and the json file is created, but the only data on there is "author:". Sorry if the answer is obvious to some of you, but I'm just starting to learn.

Here's the entire error log:

(tutorial-env) C:\Users\afaal\Desktop\python\webscraper>python webscraper.py

Traceback (most recent call last):
  File "webscraper.py", line 20, in <module>
    json.dump(tweetArr, outfile)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 179, in dump
    for chunk in iterable:
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 438, in _iterencode
    o = _default(o)
  File "C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable

12
  • 1
    Please share the entire error message. Why all the .text.encode('utf-8') ? Commented Feb 26, 2020 at 20:07
  • 2
    Stop creating bytes objects and keep the strings? Commented Feb 26, 2020 at 21:18
  • @AMC Done. Just following a tutorial, please forward your question to Ethan Jarell from HackerNoon. ;) Commented Feb 26, 2020 at 21:18
  • @juanpa.arrivillaga And how exactly would I go about doing that? Commented Feb 26, 2020 at 21:19
  • 1
    @JohnDoe you need to keep the .text part. It requires a str object. Note bytes or whatever custom type your library is using. Honestly, you really need to do some basic research on JSON serialization in Python. This sort of cargo-cult programming is not an effective way to learn anything. Commented Feb 27, 2020 at 22:12

1 Answer 1

1

Ok, so it turns out I needed to remove everything after ".text" and also just google "Json serialization" (I only tried to google my specific TypeError and didn't get any conclusive information). The correct code would then be as follows, in case any amateur like myself is having the same problem:

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text,
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text,
        "content": tweet.find('p', attrs= {'class': 'content'}).text,
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text,
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

All credit to @juanpa.arrivillaga, thanks a lot for clearing this out completely!

Sign up to request clarification or add additional context in comments.

1 Comment

you might want open('twitterData.json', 'w', encoding="utf-8")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.