0

I am trying to extract data from the HTML table on the following website: https://fuelkaki.sg/home

enter image description here

My Python code is as shown below. Pandas is unable to detect the Table. I suspect it is because Beautiful Soup is not able to capture the HTML code on the page properly.

import sys
import time
from bs4 import BeautifulSoup
import requests
import pandas as pd

try:
    url = 'https://fuelkaki.sg/home'
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.69'}
    page=requests.get(url, headers=headers)
except Exception as e:
    error_type, error_obj, error_info = sys.exc_info()
    print ('ERROR FOR LINK:', url)
    print (error_type, 'Line:', error_info.tb_lineno)
    
time.sleep(2)
soup=BeautifulSoup(page.text,'html.parser')

df = pd.read_html(page.text)
df

I have tried using Selenium as well (see code below), but still unable to capture the HTML table information.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

url = 'https://fuelkaki.sg/home'
options = Options()
options.binary_location = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"    #chrome binary location specified here
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(3)
page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html.parser')


df = pd.read_html(page)
df

Any advise would be much appreciated

4
  • It is not an static page you can fetch its data using requests. Commented Mar 8, 2022 at 6:56
  • Use something like selenium. Commented Mar 8, 2022 at 6:57
  • I have tried using Selenium (see above), but still to no avail Commented Mar 8, 2022 at 11:34
  • See my answer... Commented Mar 8, 2022 at 15:25

1 Answer 1

1

Use:

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd

url = 'https://fuelkaki.sg/home'
options = Options()

options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(3)
page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find("table", { "class" : "table" })
pd.DataFrame(np.array([x.text.replace('\u202c', '') for x in table.find_all('td')]).reshape(-1,5))

Output:

enter image description here Please be aware that using website data can be unethical.

Sign up to request clarification or add additional context in comments.

1 Comment

I added "import numpy as np" to the code and it works now. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.