Getting wrong result from JSON - Python 3

Question

Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?

Please find my code below (Google_Results is the class I use to initialize the variables):

import requests
def Book_Search(search_term):
    parms = {"q": search_term, "maxResults": 3}
    r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
    print(r.url)

    results = r.json()
    i = 0
    for result in results["items"]:
        try:
            isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
            isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
            title = str(result["volumeInfo"]["title"])
            author = str(result["volumeInfo"]["authors"])[2:-2]
            publisher = str(result["volumeInfo"]["publisher"])
            published_date = str(result["volumeInfo"]["publishedDate"])
            description = str(result["volumeInfo"]["description"])
            pages = str(result["volumeInfo"]["pageCount"])
            genre = str(result["volumeInfo"]["categories"])[2:-2]
            language = str(result["volumeInfo"]["language"])
            image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])

            dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
                           language, image_link)
            gr.append(dict)
            print(gr[i].title)
            i += 1
        except:
            pass
    return

gr = []
Book_Search("Linkedin")

I am a beginner to Python, so any help would be appreciated!

@Coldspreed thanks, i actually didnt and now i see that it at least appears twice, in the first and last result (I am asking for 3 results). However, how do I move from here, because a JSON file should always offer me the same strucutre right? I checked the url in the browser and there is a result there: "title": "Ontdek Google Chrome, Gmail, Google Foto's en Google Drive" — Vincent
– Vincent, Commented Jul 22, 2017 at 20:53
@IgnacioVazquez-Abrams Do you mean where I get the information from? That would be the Google API, using the request module. But please do let me know if you mean anything different. — Vincent
– Vincent, Commented Jul 22, 2017 at 20:56

zwer · Accepted Answer · 2017-07-22 21:43:43Z

It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.

Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:

def book_search(search_term, max_results=3):
    results = []  # a list to store the results
    parms = {"q": search_term, "maxResults": max_results}
    r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
    try:  # just in case the server doesn't return valid JSON
        for result in r.json().get("items", []):
            if "volumeInfo" not in result:  # invalid entry - missing volumeInfo
                continue
            result_dict = {}  # a dictionary to store our discovered fields
            result = result["volumeInfo"]  # all the data we're interested is in volumeInfo
            isbns = result.get("industryIdentifiers", None)  # capture ISBNs
            if isinstance(isbns, list) and isbns:
                for i, t in enumerate(("isbn10", "isbn13")):
                    if len(isbns) > i and isinstance(isbns[i], dict):
                        result_dict[t] = isbns[i].get("identifier", None)
            result_dict["title"] = result.get("title", None)
            authors = result.get("authors", None)  # capture authors
            if isinstance(authors, list) and len(authors) > 2:  # you're slicing from 2
                result_dict["author"] = str(authors[2:-2])
            result_dict["publisher"] = result.get("publisher", None)
            result_dict["published_date"] = result.get("publishedDate", None)
            result_dict["description"] = result.get("description", None)
            result_dict["pages"] = result.get("pageCount", None)
            genres = result.get("authors", None)  # capture genres
            if isinstance(genres, list) and len(genres) > 2:  # since you're slicing from 2
                result_dict["genre"] = str(genres[2:-2])
            result_dict["language"] = result.get("language", None)
            result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
            # make sure Google_Results accepts keyword arguments like title, author...
            # and make them optional as they might not be in the returned result
            gr = Google_Results(**result_dict)
            results.append(gr)  # add it to the results list
    except ValueError:
        return None  # invalid response returned, you may raise an error instead
    return results  # return the results

Then you can easily retrieve as much info as possible for a term:

gr = book_search("Google")

And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.

I don't even know what to say, thanks man! I see I went about this way too fast 'n loose and that more logic is very much needed and really appriciate the way you use the dictionary to allow for missing values. Thanks for providing such a detailed response, It really helps me learn, thanks!

Vincent · Accepted Answer · 2017-07-22 21:25:25Z

0

Following @Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.

Thanks for the help guys!

answered Jul 22, 2017 at 21:25

Vincent

542 silver badges10 bronze badges

Collectives™ on Stack Overflow

Getting wrong result from JSON - Python 3

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related