Extracting data from multiple links within the same web page using python

Question

I am a newbie to python and web scraping.

I am trying to extract information about test components of clinical diagnostic tests from this link. https://labtestsonline.org/tests-index

Tests index has a list of names of test components for various clinical tests. Clicking on each of those names takes you to another page containing details about individual test component. From the this page i would like to extract part which has common questions.

and finally put together a data frame containing the names of the test components in one column and each question from the common questions as the rest of the columns (as shown below).

Names    how_its_used    when_it_is_ordered  what_does_test_result_mean

SO far i have only managed to get the names of the test components.

import requests
from bs4 import BeautifulSoup
url = 'https://labtestsonline.org/tests-index'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml' )
print(soup.prettify())
l = [] #get the names of the test components from the index
for i in soup.select("a[hreflang*=en]"):
l.append(i.text)
import pandas as pd
names = pd.DataFrame({'col':l})  # convert the above list to a dataframe

Marcus Lind · Accepted Answer · 2018-01-19 08:33:10Z

1

I suggest that you take a look at the open source web scraping library Scrapy. It will help you with many of the concerns that you might run in to when scraping websites such as:

Following the links on each page.
Scraping data from pages that match a particular pattern, e.g. you might only want to scrape the /detail page, while the other pages just scrape links to crawl.
lxml and css selectors.
Concurrency, allowing you to crawl multiple pages at the same time which will greatly speed up your scraper.

It's very easy to get going and there is a lot of resources out there of how to build simple to advanced web scrapers using the Scrapy library.

answered Jan 19, 2018 at 8:33

Marcus Lind

11.6k9 gold badges66 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Extracting data from multiple links within the same web page using python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related