0

i received this error after running my spider i also have a pipeline and i convert everything into JSON but still got this error after my item return

TypeError: Object of type 'bytes' is not JSON serializable

my code is


    import json
    import re
    import types

    SEPARATOR = '-'
    FILING_PROPERTIES = ['state_id', 'types', 'description', 'filing_parties', 'filed_on']
    DOCUMENT_PROPERTIES = ['types', 'title', 'blob_name', 'state_id', 'source_url']


    class AeeiPipeline(object):
        def process_item(self, item, spider):
            import pdb
            #
            if item.get('title', None):
                item['source_title'], item['title'] = self.title_case(item['title'])
            if item.get('description'):
                pdb.set_trace()
                item['description'] = self.title_case(item['description'])
            for filing in item.get("filings", []):
                if filing.get('description'):
                    pdb.set_trace()
                    filing['description'] = self.title_case(filing['description'])
                for _key in ["filing_parties", "types"]:
                    if not (_key in filing and filing[_key]):
                        filing[_key] = []
                    elif isinstance(filing[_key], str):
                        filing[_key] = [filing[_key]]

                for doc in filing.get("documents", []):
                    if doc.get('name'):
                        doc['name'] = doc['name']
                    if doc.get('title'):
                        doc['title'] = self.make_unicode(doc['title'])
                    if "types" in doc and not type(doc["types"]) is list:
                        doc["types"] = [doc["types"]]
            for _key in ["industries", "assignees", "major_parties", "source_assignees", "source_major_parties"]:
                if not (_key in item and item[_key]):
                    item[_key] = []
                elif isinstance(item[_key], str):
                    item[_key] = [item[_key]]

            for key, value in item.items():
                if type(item[key]) is str:
                    item[key] = value.strip()
            pdb.set_trace()
            item = json.dumps(item) + '\n'
            return item

        def title_case(self, title):
            title = self.make_unicode(title)
            return title, re.sub(u"[A-Za-z]+(('|\u2019)[A-Za-z]+)?",
                                 lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(),
                                 title)
3
  • It means you have a bytes field in your dict, you have few options, build you own JSON Encoder, or simply cast to str. Does json.dumps(item, default=str) works? Commented Sep 20, 2019 at 7:02
  • i did this item = json.dumps(item) + '\n' but got the error TypeError: Object of type 'PucItem' is not JSON serializable Commented Sep 20, 2019 at 7:03
  • 1
    Please read minimal reproducible example to learn how to write a good question. My advice, avoid adding characters to a json string by yourself, this is the job of JSON Encoder (which is called by dumps method). Without having the full traceback (copy paste, no screenshot) of your error it is difficult to know. Have you tried default=str without adding \n at the end as suggested in my previous comment? Commented Sep 20, 2019 at 11:11

1 Answer 1

1
TypeError: Object of type 'PucItem' is not JSON serializable

This means you are using Scrapy's Item class

Solution is that either do this

item = json.dumps(dict(item))

Or in your Spider, do NOT use Item class to create item, just use a Dict like item = {}

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.