1

I'm new to web scraping and am currently trying out this block of code

import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

names = soup.find_all('h2') #name of food
rest = soup.find_all('span', {'class' : 'amount'}) # price of food

for div, a in zip(names, rest):
    print(div.text, a.text) # print name / price in same line

It works great except for one problem that I will show in the link below

printing result of 2 for loops in same line

Beside the string "HONEY GLAZED CHICKEN WING" is a $0.00 which is an outlier returned as a result of the shopping cart app on the website (it shares the span class='amount').

How would I remove this string and "move up" the other prices so that they are now in line and correspond with the names of the food

Edit: Sample output below

 Line1: HONEY GLAZED CHICKEN WING $0.00
 Line2: CRISPY CHICKEN LUNCH BOX
 Line3:                                                    $5.00
 Line4: BREADED FISH LUNCH BOX
 Line5:                                                    $5.00

My desired output would be something like:

 Line1: HONEY GLAZED CHICKEN WING                          $5.00
 Line2: CRISPY CHICKEN LUNCH BOX                           $5.00

I'm looking for a solution that removes the outlying $0.00 and moves the rest of the prices up

1
  • please paste a short and representative sample of your current output, as well as your intended output. Otherwise no one will get what you want. Commented Jun 7, 2018 at 4:37

3 Answers 3

1

I think you might have asked the wrong question. You can eliminate the $0.00 outlier, but your results for the prices still won't match up with the names.

To be sure that your list of prices and and names are in the same order, so they match up, it might be easier to search for the divs that contain both of them first:

import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

# all the divs that held the foods had this same style
divs = soup.find_all('div', {'style': 'max-height:580px;'})
names_and_prices = {
    # name: price
    div.find('h2').text: div.find('span', {'class': 'amount'}).text
    for div in divs
}
for name, price in names_and_prices.items():
    print(name, price)
Sign up to request clarification or add additional context in comments.

7 Comments

thanks this was what i was looking for, I ran this block of code but it didn't put it on the same line though, do u know what i'm missing? The output is exactly as the one in my post, except it didn't include the $0.00
Hey, it's doing that cause the price string from the span tag has a bunch of whitespace on both sides. Python actually has a function to get rid of exactly that called strip(). Try changing div.find('span', {'class': 'amount'}).text to div.find('span', {'class': 'amount'}).text.strip() with the .strip() at the end. PS, in the desired output you posted, it says the crispy chicken lunch box is $5.00. It's not on the site, it's $4.50. That's why I was saying be careful with the order :p
I guess, printed that way, it won't line up in columns like you showed.
Really appreciate your help! Thanks man, totally fixed everything with just that one text.strip()
Np! (Just PPS, if you wanted to print in nice columns, you can change the print to print("{: <50} {: >5}".format(name, price)). Basically the name and price get put in between the { } brackets, and the 50/5 give a minimum width for the two strings so that it ends up being in columns. Dunno if you need that at all tho.)
|
1

To get the output the way you have mentioned above, you can try like below:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

for items in soup.find_all(class_='product-cat-lunch-boxes'):
    name = items.find("h2").get_text(strip=True)
    price = items.find(class_="amount").get_text(strip=True)
    print(name,price)

Results are like:

HONEY GLAZED CHICKEN WING LUNCH BOX $5.00
CRISPY CHICKEN LUNCH BOX $4.50
BREADED FISH LUNCH BOX $4.50
EGG OMELETTE LUNCH BOX $4.50
FRIED TWO-JOINT WING LUNCH BOX $4.50

Comments

0

try this:

for div, a in zip(names, rest):
    if a.text.strip() and '$0.00' not in a.text: # empty strings are False
        print(div.text, a.text) # print name / price in same line
    else:                       # optional
         print 'Outlier'        # optional

Keep in mind this will ONLY work for outliers that contain '$0.00' in a.text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.