0

I want to scrape website and put the desired data in to JSON file. The issue I'm countered is that I get a text and only can print it. But I need to add only specific data in JSON file and reuse data with my classes. the WEB I'm scraping and my code:

import requests
from bs4 import BeautifulSoup

URL = 'https://lt.brcauto.eu/automobiliu-paieska/'

req = requests.get(URL)
soup = BeautifulSoup(req.text, 'lxml')

pages = soup.find_all('li', class_ = 'page-item')[-2] #biggest page -2 ">" we need only before the last

cars_printed_counter = 0 

for number in range(1, int(pages.text)):
req = requests.get(URL + '?page=' + str(number))
soup = BeautifulSoup(req.text, 'lxml')

if cars_printed_counter == 20:
    break

for single_car in soup.find_all('div', class_ = 'cars-wrapper'):

    if cars_printed_counter == 20:
        break

    Car_Title = single_car.find('h2', class_ = 'cars__title')
    Car_Specs = single_car.find('p', class_ = 'cars__subtitle')
    

    print('\nCar number:', cars_printed_counter + 1)
    
    print(Car_Title.text)
    print(Car_Specs.text)


    cars_printed_counter += 1

The data I get looks like this: Printed results

Car number: 19

BMW 520 Gran Turismo M-Sport

2013 | 2.0 Diesel | Automation | 255229 km | 135 kW (184 AG) | Black

Car number: 20

BMW 750 i Automation

2005 | 5.0 Gasoline | Automation | 343906 km | 270 kW (367 AG) | Grey

And the question is: How should I put the data into JSON file that it would look like this: Desired json

[
{
    "fuel": "diesel",
    "title": "BMW 520 Gran Turismo M-Sport",
    "year": 2013,
    "run": 255229,
    "type": "Black"
},
{
    "fuel": "gasoline",
    "title": "BMW 750 i Automation",
    "year": 2005,
    "run": 343906,
    "type": "Grey"
},
0

2 Answers 2

2

You could do something like this. Check out this link on how to create dicts in python

import json

# this is going to store your dicts of cars
list_of_printed_cars = []

for single_car in soup.find_all('div', class_ = 'cars-wrapper'):

    if cars_printed_counter == 20:
        break

    Car_Title = single_car.find('h2', class_ = 'cars__title')
    Car_Specs = single_car.find('p', class_ = 'cars__subtitle')

    # printed_car is a dictionary of the car's title and specs
    printed_car = {
        'title': Car_Title.text,
        'specs': Car_Specs.text
    }

    # this appends to a list that stores each car's title and specs
    list_of_printed_cars.append(printed_car)
    
    
# to use list_of_printed_cars, you need to convert it to a json add it to a file
with open('data.json', 'w') as f:     
    json.dump(list_of_printed_cars, f)

You can then use the dict of list_of_printed_cars as json by using json.dumps and saving it into a file

Sign up to request clarification or add additional context in comments.

2 Comments

OP wants to store it in the json file so it's worth adding with open('data.json', 'w') as f: json.dump(list_of_printed_cars, f)
ha literally just added json.dumps() into my answer
1

Straight to the point:

import requests
from bs4 import BeautifulSoup
import json

URL = 'https://lt.brcauto.eu/automobiliu-paieska/'

req = requests.get(URL)
soup = BeautifulSoup(req.text, 'lxml')

pages = soup.find_all('li', class_='page-item')[-2]  # biggest page -2 ">" we need only before the last

cars_printed_counter = 0

for number in range(1, int(pages.text)):
    req = requests.get(URL + '?page=' + str(number))
soup = BeautifulSoup(req.text, 'lxml')

if cars_printed_counter == 20:
    break
out = []
for single_car in soup.find_all('div', class_='cars-wrapper'):

    if cars_printed_counter == 20:
        break

    Car_Title = single_car.find('h2', class_='cars__title')
    Car_Specs = single_car.find('p', class_='cars__subtitle')

    print('\nCar number:', cars_printed_counter + 1)

    print(Car_Title.text)
    print(Car_Specs.text)

    car = {}
    car["title"] = Car_Title.text
    subs = Car_Specs.text.split(' | ')
    car["year"] = subs[0]
    car["fuel"] = subs[1].split(" ")[1]
    car["run"] = subs[3].split(" ")[0]
    car["type"] = subs[5]
    car["number"] = cars_printed_counter + 1
    out.append(car)
    cars_printed_counter += 1

print(json.dumps(out))
with open("outfile.json", "w") as f:
    f.write(json.dumps(out))

Explanation: We create an out variable that will hold all the cars. As we loop them we create a dictionary with the values we want. But since the specs is a string we split that string by " | " to get the separate components. Then just map each component to a member in the dict. We then take that dict and append it to the out object. All said and done we have a list of dicts that contain all the info we need. Then we call json.dumps() on that list to get the json and save that to a file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.