0

I'm wring a web scraping program to collect data from truecar.com my database has 3 columns and when I run the program I get an error which is this : list indext out of range here is what I've done so far:

import mysql.connector
from bs4 import BeautifulSoup
import requests
import re

# take the car's name
requested_car_name = input()

# inject the car's name into the URL

my_request = requests.get('https://www.truecar.com/used-cars-for-sale/listings/' +
                          requested_car_name + '/location-holtsville-ny/?sort[]=best_match')


my_soup = BeautifulSoup(my_request.text, 'html.parser')

# ************ car_model column in database ******************
car_model = my_soup.find_all(
    'span', attrs={'class': 'vehicle-header-make-model text-truncate'})

# we have a list of car models
car_list = []
for item in range(20):
    # appends car_model to car_list
    car_list.append(car_model[item].text)

car_string = ', '.join('?' * len(car_list))


# ************** price column in database *****************************
price = my_soup.find_all(
    'div', attrs={'data-test': 'vehicleCardPricingBlockPrice'})
price_list = []
for item in range(20):
    # appends price to price_list
    price_list.append(price[item].text)

price_string = ', '.join('?' * len(price_list))


# ************** distance column in database ***************************
distance = my_soup.find_all('div', attrs={'data-test': 'vehicleMileage'})
distance_list = []
for item in range(20):
    # appends distance to distance_list
    distance_list.append(distance[item].text)

distance_string = ', '.join('?' * len(distance_list))

# check the connection
print('CONNECTING ...')

mydb = mysql.connector.connect(
    host="xxxxx",
    user="xxxxxx",
    password="xxxxxx",
    port='xxxxxx',
    database='xxxxxx'
)

print('CONNECTED')

# checking the connection is done

my_cursor = mydb.cursor(buffered=True)
insert_command = 'INSERT INTO car_name (car_model, price, distance) VALUES (%s, %s, %s);' % (car_string, price_string, distance_string)
# values = (car_string, price_string, distance_string)
my_cursor.execute(insert_command, car_list, price_list, distance_list)
mydb.commit()

print(my_cursor.rowcount, "Record Inserted")

mydb.close()

and I have another problem that I can't insert a list into my columns and I have tried many ways but unfortunately I wasn't able to get it working

I think the problem is in this line:

IndexError                                Traceback (most recent call last)
<ipython-input-1-4a3930bf0f57> in <module>
     23 for item in range(20):
     24     # appends car_model to car_list
---> 25     car_list.append(car_model[item].text)
     26 
     27 car_string = ', '.join('?' * len(car_list))

IndexError: list index out of range

I don't want it to insert the whole list to 1 row in database . I want the first 20 car's price, model, mileage in truecar.com in my database

7
  • can you past the complete trace to show which line exactly is causing the issue? Commented Feb 17, 2021 at 12:14
  • man, could you give us traceback? Commented Feb 17, 2021 at 12:24
  • sorry I had problem with internet Commented Feb 17, 2021 at 12:29
  • change the line 25 to --- for item in range(len(car_model)): you can't hardcode it to 20 Commented Feb 17, 2021 at 12:31
  • change line 25 to what ? Commented Feb 17, 2021 at 12:33

2 Answers 2

1

Ya you are hard coding the length. Change how you are iterating through your soup elements. So:

import mysql.connector
from bs4 import BeautifulSoup
import requests


# take the car's name
requested_car_name = input('Enter car name: ')

# inject the car's name into the URL

my_request = requests.get('https://www.truecar.com/used-cars-for-sale/listings/' +
                          requested_car_name + '/location-holtsville-ny/?sort[]=best_match')


my_soup = BeautifulSoup(my_request.text, 'html.parser')

# ************ car_model column in database ******************
car_model = my_soup.find_all(
    'span', attrs={'class': 'vehicle-header-make-model text-truncate'})

# we have a list of car models
car_list = []
for item in car_model:
    # appends car_model to car_list
    car_list.append(item.text)




# ************** price column in database *****************************
price = my_soup.find_all(
    'div', attrs={'data-test': 'vehicleCardPricingBlockPrice'})
price_list = []
for item in price:
    # appends price to price_list
    price_list.append(item.text)




# ************** distance column in database ***************************
distance = my_soup.find_all('div', attrs={'data-test': 'vehicleMileage'})
distance_list = []
for item in distance:
    # appends distance to distance_list
    distance_list.append(item.text)




# check the connection
print('CONNECTING ...')

mydb = mysql.connector.connect(
    host="xxxxx",
    user="xxxxxx",
    password="xxxxxx",
    port='xxxxxx',
    database='xxxxxx'
)

print('CONNECTED')

# checking the connection is done

my_cursor = mydb.cursor(buffered=True)
insert_command = 'INSERT INTO car_name (car_model, price, distance) VALUES (%s, %s, %s)'
values = list(zip(car_list, price_list, distance_list))
my_cursor.executemany(insert_command, values)
mydb.commit()

print(my_cursor.rowcount, "Record Inserted")

mydb.close()

ALTERNATE:

there's also the API where you can fetch the dat:

import mysql.connector
import requests
import math


# take the car's name
requested_car_name = input('Enter car name: ')

# inject the car's name into the URL

url = 'https://www.truecar.com/abp/api/vehicles/used/listings'
payload = {
'city': 'holtsville',
'collapse': 'true',
'fallback': 'true',
'include_incentives': 'true',
'include_targeted_incentives': 'true',
'make_slug': requested_car_name,
'new_or_used': 'u',
'per_page': '30',
'postal_code': '',
'search_event': 'true',
'sort[]': 'best_match',
'sponsored': 'true',
'state': 'ny',
'page':'1'}


jsonData = requests.get(url, params=payload).json()
total = jsonData['total']
total_pages = math.ceil(total/30)

total_pages_input = input('There are %s pages to iterate.\nEnter the number of pages to go through or type ALL: ' %total_pages)
if total_pages_input.upper() == 'ALL':
    total_pages = total_pages
else:
    total_pages = int(total_pages_input)

values = []
for page in range(1,total_pages+1):
    if page == 1:
        car_listings = jsonData['listings']
    else:
        payload.update({'page':'%s' %page})
        jsonData = requests.get(url, params=payload).json()
        car_listings = jsonData['listings']
        
    for listing in car_listings:
        vehicle = listing['vehicle']
        
        ex_color = vehicle['exterior_color']
        in_color = vehicle['interior_color']
        location = vehicle['location']
        price = vehicle['list_price']
        make = vehicle['make']
        model = vehicle['model']
        mileage = vehicle['mileage']
        style = vehicle['style']
        year = vehicle['year']
        
        engine = vehicle['engine']
        accidentCount = vehicle['condition_history']['accidentCount']
        ownerCount = vehicle['condition_history']['ownerCount']
        isCleanTitle = vehicle['condition_history']['titleInfo']['isCleanTitle']
        isFrameDamaged = vehicle['condition_history']['titleInfo']['isFrameDamaged']
        isLemon = vehicle['condition_history']['titleInfo']['isLemon']
        isSalvage = vehicle['condition_history']['titleInfo']['isSalvage']
        isTheftRecovered = vehicle['condition_history']['titleInfo']['isTheftRecovered']
        
        values.append((ex_color, in_color,location,price,make,model,mileage,
        style,year,engine,accidentCount,ownerCount,isCleanTitle,isFrameDamaged,
        isLemon, isSalvage,isTheftRecovered))
    print('Completed: Page %s of %s' %(page,total_pages))
        
        
# check the connection
print('CONNECTING ...')

mydb = mysql.connector.connect(
    host="xxxxx",
    user="xxxxxx",
    password="xxxxxx",
    port='xxxxxx',
    database='xxxxxx'
)

print('CONNECTED')

# checking the connection is done

my_cursor = mydb.cursor(buffered=True)

# create_command = ''' create table car_information (exterior_color varchar(255), interior_color varchar(255),location varchar(255),price varchar(255),make varchar(255),model varchar(255),mileage varchar(255),
#         style varchar(255),year varchar(255),engine varchar(255),accidentCount varchar(255),ownerCount varchar(255),isCleanTitle varchar(255),isFrameDamaged varchar(255),
#         isLemon varchar(255), isSalvage varchar(255),isTheftRecovered varchar(255))'''

# my_cursor.execute(create_command)
# print('created')


insert_command = '''INSERT INTO car_name (exterior_color, interior_color,location,price,make,model,mileage,
        style,year,engine,accidentCount,ownerCount,isCleanTitle,isFrameDamaged,
        isLemon, isSalvage,isTheftRecovered) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'''
my_cursor.executemany(insert_command, values)
mydb.commit()

print(my_cursor.rowcount, "Record Inserted")

mydb.close()             
Sign up to request clarification or add additional context in comments.

11 Comments

I got this error with your changes : ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Sorry. Had a typo. Fixed it above
it gives me question mark in database same as Raphael Eckert's answer
well you're joining the strings with car_string = ', '.join('?' * len(car_list)) I edited my solution
Data too long for column 'car_model' at row 1
|
0

the problem seems to be that the list of car models has less than 20 entries.

for item in range(20):
  car_list.append(car_model[item].text)

this always tries to append exactly 20 items to the car list. if you have less than 20 entries, there is an error, because car_model[20].text does not exist when there are only 10 entries. you can try

for item in range(len(car_model)):
  car_list.append(car_model[item].text)

7 Comments

in database it is all question marks like this => 5 ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
can you print(car_model) right after you pull it from the db, to see what acually got pulled?
I entered bmw and it gave me 1 Record Inserted and in database I have this: '6', '?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?', '?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?', '?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?'
I don't want it to insert the whole list to 1 row in database . I want the first 20 car's price, model, mileage in truecar.com in my database
"can you print(car_model) right after you pull it from the db, to see what acually got pulled?" can you still do this. I am not talking about inserting, i am talking about the format and content of this secific variable right before the appending.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.