2

I am trying to scrape the realtime commodity value form this webpage http://www.mcxliverates.in It has an iframe address: http://213.136.84.136:8000/

This is what I tried with BeautifulSoup:

from bs4 import BeautifulSoup
#import time
import urllib

data = []
url=urllib.urlopen("http://213.136.84.136:8000/")
html=url.read()
url.close()
soup = BeautifulSoup(html,"html.parser")
span=soup.find('table', attrs={'class':'table2'})
table_body = span.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

print([data])

Below is output i'm getting:

[[[], [u'INTERNATIONAL MARKET'], [u'SPOT Gold'], [u'SPOT Silver'], [u'CrudeOil'], [u'Copper'], [u'NaturalGas'], [u'Dow Jones'], [u'Bank Nifty'], [u'INDIAN MARKET'], [u'MCXGold'], [u'MCXSilver'], [u'MCXCrudeOil'], [u'MCXCopper'], [u'MCXLead'], [u'MCXNickel'], [u'MCXZinc'], [u'MCXNaturalGas'], [u'MCXAluminium'], [u'MCXMenthaOil'], [u'USDINR'], [], [u"Disclaimer: We can't assure any guarantee about the accuaracy of the data."]]]

Its not returning any quotes.

Here is HTML code:

<table class="table2" border="1">
    <tbody>
        <tr style="background-color: #DFD9D9; font-size: 16px; ">
            <th style="border:4px solid white; text-align:left; ">SYMBOL</th>
            <th style="border:4px solid white; text-align:right;">LTP</th>
            <th style="border:4px solid white; text-align:right;">HIGH</th>
            <th style="border:4px solid white; text-align:right;">LOW</th>
        </tr>
        <tr style="font-size: 13px; color: black;">
            <td colspan="4" style="color:black; text-align:center; font-size:initial; background-color:lightyellow;">INTERNATIONAL MARKET</td>
        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">SPOT Gold</td>
            <td id="goldsell" style="text-align:right;"></td>
            <td id="goldhigh" style="text-align:right;"></td>
            <td id="goldlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">SPOT Silver</td>
            <td id="Silversell" style="text-align:right;"></td>
            <td id="Silverhigh" style="text-align:right;"></td>
            <td id="Silverlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">CrudeOil</td>
            <td id="CrudeOilsell" style="text-align:right;"></td>
            <td id="CrudeOilhigh" style="text-align:right;"></td>
            <td id="CrudeOillow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">Copper</td>
            <td id="Coppersell" style="text-align:right;"></td>
            <td id="Copperhigh" style="text-align:right;"></td>
            <td id="Copperlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">NaturalGas</td>
            <td id="NaturalGassell" style="text-align:right;"></td>
            <td id="NaturalGashigh" style="text-align:right;"></td>
            <td id="NaturalGaslow" style="text-align:right;"></td>
        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">Dow Jones</td>
            <td id="DJsell" style="text-align:right;"></td>
            <td id="DJhigh" style="text-align:right;"></td>
            <td id="DJlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">Bank Nifty</td>
            <td id="BNsell" style="text-align:right;"></td>
            <td id="BNhigh" style="text-align:right;"></td>
            <td id="BNlow" style="text-align:right;"></td>
        </tr>
        <tr style="font-size: 13px; color: black;">
            <td colspan="4" style="color:black; text-align:center; font-size:initial; background-color:lightyellow">INDIAN MARKET</td>
        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXGold</td>
            <td id="MCXGoldsell" style="text-align:right;"></td>
            <td id="MCXGoldhigh" style="text-align:right;"></td>
            <td id="MCXGoldlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px; color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXSilver</td>
            <td id="MCXSilversell" style="text-align:right;"></td>
            <td id="MCXSilverhigh" style="text-align:right;"></td>
            <td id="MCXSilverlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXCrudeOil</td>
            <td id="MCXCrudeOilsell" style="text-align:right;"></td>
            <td id="MCXCrudeOilhigh" style="text-align:right;"></td>
            <td id="MCXCrudeOillow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXCopper</td>
            <td id="MCXCoppersell" style="text-align:right;"></td>
            <td id="MCXCopperhigh" style="text-align:right;"></td>
            <td id="MCXCopperlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXLead</td>
            <td id="MCXLeadsell" style="text-align:right;"></td>
            <td id="MCXLeadhigh" style="text-align:right;"></td>
            <td id="MCXLeadlow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXNickel</td>
            <td id="MCXNickelsell" style="text-align:right;"></td>
            <td id="MCXNickelhigh" style="text-align:right;"></td>
            <td id="MCXNickellow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXZinc</td>
            <td id="MCXZincsell" style="text-align:right;"></td>
            <td id="MCXZinchigh" style="text-align:right;"></td>
            <td id="MCXZinclow" style="text-align:right;"></td>

        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXNaturalGas</td>
            <td id="MCXNaturalGassell" style="text-align:right;"></td>
            <td id="MCXNaturalGashigh" style="text-align:right;"></td>
            <td id="MCXNaturalGaslow" style="text-align:right;"></td>
        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXAluminium</td>
            <td id="MCXAluminiumsell" style="text-align:right;"></td>
            <td id="MCXAluminiumhigh" style="text-align:right;"></td>
            <td id="MCXAluminiumlow" style="text-align:right;"></td>
        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">MCXMenthaOil</td>
            <td id="MCXMenthaOilsell" style="text-align:right;"></td>
            <td id="MCXMenthaOilhigh" style="text-align:right;"></td>
            <td id="MCXMenthaOillow" style="text-align:right;"></td>
        </tr>
        <tr style="font-size: 15px;color: black;">
            <td style="color:#3399ff; background-color:#E0F5D6;">USDINR</td>
            <td id="USDINRsell" style="text-align:right;"></td>
            <td id="USDINRhigh" style="text-align:right;"></td>
            <td id="USDINRlow" style="text-align:right;"></td>

        </tr>

        <tr>
            <td id="lastupdate" colspan="4" style="color:grey; text-align:center; background-color:lavender;">
            </td>
        </tr>

        <tr>
            <td id="disclaimer" colspan="4" style="color:red; text-align:center; background-color:lavender;">
                Disclaimer: We can't assure any guarantee about the accuaracy of the data.
            </td>
        </tr>

    </tbody>
</table>

Anyone know any other method to get this real-time quote?

4
  • URL you provide has restricted access, so if you're looking for help it would be best to include a snippet of HTML in your question. Commented Feb 22, 2018 at 16:04
  • Thanks for providing HTML snippet. It looks like this is just a template and that actual quote data are added to DOM with JavaScript. This is how the data provider provides updates. See? <td id="goldhigh" style="text-align:right;"></td> There's no content in the td tag. Commented Feb 22, 2018 at 16:36
  • yes exactly! Is there any way to get data from JavaScript? Commented Feb 22, 2018 at 16:41
  • 2
    Selenium. Commented Feb 22, 2018 at 17:22

1 Answer 1

1
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# De-comment next two lines and add chrome_options=chrome_options to ChromeOptions() run Chrome as headless
# chrome_options = webdriver.ChromeOptions()
# chrome_options.add_argument('--headless')
driver = webdriver.Chrome()

url = "http://www.mcxliverates.in/"
driver.get(url)
xpath = '/html/body/div[1]/div[4]/div[1]/iframe'  # this is the where the table that holds our data is located
elem = driver.find_element_by_xpath(xpath)
driver.switch_to.frame(elem)

wait = WebDriverWait(driver, 10)
xpath = '/html/body/div/div/div/table/tbody'
wait.until(EC.presence_of_element_located((By.XPATH, xpath)))  # waits until the data is loaded
html = driver.page_source
driver.close()

soup = BeautifulSoup(html, "html.parser")
table = soup.find('table', class_='table2')  # BeautifulSoup's built-in way of specifying class
table_body = table.find('tbody')

data = []
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])  # Get rid of empty values

print([data])

Output:

[[[], ['INTERNATIONAL MARKET'], ['SPOT Gold', '1330.09', '1335.92', '1320.63'], ['SPOT Silver', '16.585', '16.734', '16.338'], ['CrudeOil', '62.82', '63.02', '60.71'], ['Copper', '3.2300', '3.2304', '3.1430'], ['NaturalGas', '2.660', '2.684', '2.638'], ['Dow Jones', '25034', '25250', '24572'], ['Bank Nifty', '24955.0', '24957.0', '24763.0'], ['INDIAN MARKET'], ['MCXGold', '30564', '30575', '30400'], ['MCXSilver', '38713', '38755', '38250'], ['MCXCrudeOil', '4092', '4104', '3974'], ['MCXCopper', '464.85', '464.85', '453.30'], ['MCXLead', '166.00', '166.30', '161.65'], ['MCXNickel', '895.00', '896.30', '871.10'], ['MCXZinc', '231.60', '231.60', '226.00'], ['MCXNaturalGas', '172.00', '173.10', '170.90'], ['MCXAluminium', '142.10', '142.35', '140.65'], ['MCXMenthaOil', '1252.00', '1299.80', '1246.00'], ['USDINR', '64.9360', '65.1040', '64.8780'], ['Last Updated on  : 11:45:57 22-Feb-2018'], ["Disclaimer: We can't assure any guarantee about the accuaracy of the data."]]]

Welcome to the wonderful world of Selenium

Sign up to request clarification or add additional context in comments.

1 Comment

I put soup properties on all my Page Objects for just this reason :) Grab once with the webdriver, parse the tree. Fast, elegant, avoids mutiple selenium calls.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.