You are assigning data as df.head() which returns the first 5 rows of a dataframe. Instead you can do:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()
Also , pandas is capable to read html directly so you can just do:
data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]
and feed that under your try and except statement.
Outputs:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
Vs
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
.. ... .. .. ... ...
558 559 NaN NaN December 19, 2019 Contacts/Locations and Study Status
559 560 NaN NaN December 20, 2019 Contacts/Locations and Study Status
560 561 NaN NaN December 23, 2019 Contacts/Locations and Study Status
561 562 NaN NaN December 25, 2019 Contacts/Locations and Study Status
562 563 NaN NaN December 27, 2019 Contacts/Locations and Study Status
[563 rows x 5 columns]
data = df.head()todata = dfor justdata = pd.read_html(html_data2.text)[0]and get rid of the extra line