1

I'm doing a web scraper of an insurance webpage that retrieves me in a CSV the model, brand, subbrand, and the description and when I run my code sometimes it works and other times gives me multiple errors ( "list indices must be integers", "Expecting value: line 1 column 1", "JSON decoder is not working")

I've tried to insert prints and try to see where was the problem but still not get it.

import requests
import time
import json


session = requests.Session()
request_marcas = session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/brands-subbrands')
data = request_marcas.json()
fileCSV = open("webscraper_test.csv", "a")
fileCSV.write('Modelo' + ';' + 'ID_Marca' + ";" + 'ID_Submarca' + ";" + "ID_Tipo" + ";" + "Marca" +";"+ "Tipo"+ 'Descripcion' + "\n")

for i in range(2019, 2020):
        for marca in data['MARCA']:
            for submarca in marca['SUBMARCAS']:
                modelos = []
                modelos.append('https://www.citibanamexchubb.com/api/chubbnet/auto/models/' + marca['ID'] + '/' + submarca['ID'] + '/' + str(i))
                for link in modelos:
                    json_link = []
                    request_link = session.get(link).json()
                    json_link.append(request_link)
                    #print(request_link)
                    for desc_id in request_link['TIPO']:
                        #print(desc_id['ID'])
                        desc_detail = []
                        desc_detail.append(session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + desc_id['ID'] + '/2018').json())
                        #print(desc_detail)
                        try:
                            for desc in desc_detail['DESCRIPCION']:
                                print(desc['DESC'])
                        except Exception as e:
                            None

1 Answer 1

2

So there's some weird variances in the auto/models endpoint that you're scraping. For instance, https://www.citibanamexchubb.com/api/chubbnet/auto/models/7/8/2019 returns this:

{
  "TIPO": {
    "ID": "381390223",
    "DESC": "MINI COOPER"
  }
}

While https://www.citibanamexchubb.com/api/chubbnet/auto/models/1/1/2019 return this:

{
  "TIPO": [
    {
      "ID": "364026215",
      "DESC": "MDX"
    },
    {
      "ID": "364026216",
      "DESC": "RDX"
    },
    {
      "ID": "364031544",
      "DESC": "ILX"
    },
    {
      "ID": "364031613",
      "DESC": "TLX"
    },
    {
      "ID": "364031674",
      "DESC": "NSX"
    }
  ]
}

So in the first one, "TIPO" is a dict, while in the second one, "TIPO" is a list. I've made a modification to your script that gets it running without throwing any errors. I'm sure it's not quite what yo'ure looking for, but it at least handles that difference between the two types:

import requests
import time
import json


session = requests.Session()
request_marcas = session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/brands-subbrands')
data = request_marcas.json()
fileCSV = open("webscraper_test.csv", "a")
fileCSV.write('Modelo' + ';' + 'ID_Marca' + ";" + 'ID_Submarca' + ";" + "ID_Tipo" + ";" + "Marca" +";"+ "Tipo"+ 'Descripcion' + "\n")

for i in range(2019, 2020):
        for marca in data['MARCA']:
            for submarca in marca['SUBMARCAS']:
                modelos = []
                modelos.append('https://www.citibanamexchubb.com/api/chubbnet/auto/models/' + marca['ID'] + '/' + submarca['ID'] + '/' + str(i))
                for link in modelos:
                    json_link = []
                    request_link = session.get(link).json()
                    json_link.append(request_link)
                    #print(request_link)

                    # here's where I've made some changes:
                    desc_detail = []
                    if isinstance(request_link['TIPO'], dict):
                        desc_detail.append(session.get(
                            'https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + request_link['TIPO'][
                                'ID'] + '/2018').json())
                        print(request_link['TIPO']['DESC'])
                    elif isinstance(request_link['TIPO'], list):
                        for item in request_link['TIPO']:
                            desc_detail.append(session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + item['ID'] + '/2018').json())
                            print(item['DESC'])

Hope that helps!

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much!!!! It helps me a lot to get into the result I wanted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.