0

I am a newbie to python and web scraping.

I am trying to extract information about test components of clinical diagnostic tests from this link. https://labtestsonline.org/tests-index

Tests index has a list of names of test components for various clinical tests. Clicking on each of those names takes you to another page containing details about individual test component. From the this page i would like to extract part which has common questions.

and finally put together a data frame containing the names of the test components in one column and each question from the common questions as the rest of the columns (as shown below).

Names    how_its_used    when_it_is_ordered  what_does_test_result_mean

SO far i have only managed to get the names of the test components.

import requests
from bs4 import BeautifulSoup
url = 'https://labtestsonline.org/tests-index'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml' )
print(soup.prettify())
l = [] #get the names of the test components from the index
for i in soup.select("a[hreflang*=en]"):
l.append(i.text)
import pandas as pd
names = pd.DataFrame({'col':l})  # convert the above list to a dataframe

1 Answer 1

1

I suggest that you take a look at the open source web scraping library Scrapy. It will help you with many of the concerns that you might run in to when scraping websites such as:

  • Following the links on each page.
  • Scraping data from pages that match a particular pattern, e.g. you might only want to scrape the /detail page, while the other pages just scrape links to crawl.
  • lxml and css selectors.
  • Concurrency, allowing you to crawl multiple pages at the same time which will greatly speed up your scraper.

It's very easy to get going and there is a lot of resources out there of how to build simple to advanced web scrapers using the Scrapy library.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.