I am a newbie to python and web scraping.
I am trying to extract information about test components of clinical diagnostic tests from this link. https://labtestsonline.org/tests-index
Tests index has a list of names of test components for various clinical tests. Clicking on each of those names takes you to another page containing details about individual test component. From the this page i would like to extract part which has common questions.
and finally put together a data frame containing the names of the test components in one column and each question from the common questions as the rest of the columns (as shown below).
Names how_its_used when_it_is_ordered what_does_test_result_mean
SO far i have only managed to get the names of the test components.
import requests
from bs4 import BeautifulSoup
url = 'https://labtestsonline.org/tests-index'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml' )
print(soup.prettify())
l = [] #get the names of the test components from the index
for i in soup.select("a[hreflang*=en]"):
l.append(i.text)
import pandas as pd
names = pd.DataFrame({'col':l}) # convert the above list to a dataframe