Unfortunately, pandas.read_html (docs) only extracts data from HTML tables:
import pandas as pd
html = '''<html>
<body>
<table>
<tr>
<th>Col1</th>
<th>Col2</th>
</tr>
<tr>
<td>Val1</td>
<td>Val2</td>
</tr>
</table>
</body>
</html>'''
dfs = pd.read_html(html)
df[0]
Output:
0 1
0 Col1 Col2
1 Val1 Val2
For the second case where we the HTML contains an unordered list instead, the existing pandas function won't work. You can instead parse the list (and all of it's children) using an HTML parsing library like
BeautifulSoup4 and build up the dataframe row-by-row. Here's a simple example:
import pandas as pd
from bs4 import BeautifulSoup
html = '''<html>
<body>
<ul id="target">
<li class="row">
Name
<ul class="details">
<li class="Col1">Val1</li>
<li class="Col2">Val2</li>
</ul>
</li>
</ul>
</body>
</html>'''
# Parse the HTML string
soup = BeautifulSoup(html, 'lxml')
# Select the target <ul> and build dicts for each row
data_dicts = []
target = soup.select('#target')[0]
for row in target.select('.row'):
row_dict = {}
row_dict['name'] = row.contents[0].strip() # Remove excess whitespace
details = row.select('.details')
for col in details[0].findChildren('li'):
col_name = col.attrs['class'][0]
col_value = col.text.strip()
row_dict[col_name] = col_value
data_dicts.append(row_dict)
# Convert list of dicts to dataframe
df = pd.DataFrame(data_dicts)
Output:
Col1 Col2 name
0 Val1 Val2 Name
Some combination of findChildren and select should let you extract each sub-component of the based table in the site you linked. BeautifulSoup has a lot of ways of digging through HTML, so I strongly recommend working through some examples and looking through the documentation if you get stuck trying to parse out a specific set of elements.