I am trying to scrape data from a few websites for a proof of concept project. Currently using Python3 with BS4 to collect the data required. I have a dictionary of URLS from three sites. Each of the sites requires a different method to collect the data as their HTML is different. I have been using a "Try, If, Else, stack but I keep running into issues, If you could have a look at my code and help me to fix it then that would be great!
As I add more sites to be scraped I will not be able to use "Try, If, Else" to cycle through various methods to find the correct way to scrape the data, how can I future-proof this code to allow me to add as many websites and scrape data from various elements contained within in the future?
# Scraping Script Here:
def job():
prices = {
# LIVEPRICES
"LIVEAUOZ": {"url": "https://www.gold.co.uk/",
"trader": "Gold.co.uk",
"metal": "Gold",
"type": "LiveAUOz"},
# GOLD
"GLDAU_BRITANNIA": {"url": "https://www.gold.co.uk/gold-coins/gold-britannia-coins/britannia-one-ounce-gold-coin-2020/",
"trader": "Gold.co.uk",
"metal": "Gold",
"type": "Britannia"},
"GLDAU_PHILHARMONIC": {"url": "https://www.gold.co.uk/gold-coins/austrian-gold-philharmoinc-coins/austrian-gold-philharmonic-coin/",
"trader": "Gold.co.uk",
"metal": "Gold",
"type": "Philharmonic"},
"GLDAU_MAPLE": {"url": "https://www.gold.co.uk/gold-coins/canadian-gold-maple-coins/canadian-gold-maple-coin/",
"trader": "Gold.co.uk",
"metal": "Gold",
"type": "Maple"},
# SILVER
"GLDAG_BRITANNIA": {"url": "https://www.gold.co.uk/silver-coins/silver-britannia-coins/britannia-one-ounce-silver-coin-2020/",
"trader": "Gold.co.uk",
"metal": "Silver",
"type": "Britannia"},
"GLDAG_PHILHARMONIC": {"url": "https://www.gold.co.uk/silver-coins/austrian-silver-philharmonic-coins/silver-philharmonic-2020/",
"trader": "Gold.co.uk",
"metal": "Silver",
"type": "Philharmonic"}
}
response = requests.get(
'https://www.gold.co.uk/silver-price/')
soup = BeautifulSoup(response.text, 'html.parser')
AG_GRAM_SPOT = soup.find(
'span', {'name': 'current_price_field'}).get_text()
# Convert to float
AG_GRAM_SPOT = float(re.sub(r"[^0-9\.]", "", AG_GRAM_SPOT))
# No need for another lookup
AG_OUNCE_SPOT = AG_GRAM_SPOT * 31.1035
for coin in prices:
response = requests.get(prices[coin]["url"])
soup = BeautifulSoup(response.text, 'html.parser')
try:
text_price = soup.find(
'td', {'id': 'total-price-inc-vat-1'}).get_text() <-- Method 1
except:
text_price = soup.find(
'td', {'id': 'total-price-inc-vat-1'}).get_text() <-- Method 2
else:
text_price = soup.find(
'td', {'class': 'gold-price-per-ounce'}).get_text()
# Grab the number
prices[coin]["price"] = float(re.sub(r"[^0-9\.]", "", text_price))
# ============================================================================
root = etree.Element("root")
for coin in prices:
coinx = etree.Element("coin")
etree.SubElement(coinx, "trader", {
'variable': coin}).text = prices[coin]["trader"]
etree.SubElement(coinx, "metal").text = prices[coin]["metal"]
etree.SubElement(coinx, "type").text = prices[coin]["type"]
etree.SubElement(coinx, "price").text = (
"£") + str(prices[coin]["price"])
root.append(coinx)
fName = './templates/data.xml'
with open(fName, 'wb') as f:
f.write(etree.tostring(root, xml_declaration=True,
encoding="utf-8", pretty_print=True))
.findreturnsNoneif anything is find. So maybe you can just use ordinaryif-elif-else?