2

I am trying to extract some data from a site using beautiful soup, specifically a table in which the table and rows are stored in div tags rather than the usual table tag. This means i cannot use the pandas read_html function to simply extract all the tables .

Here is the html i extracted

<div class="block">
<div class="expand">
<div class="expand-button collapsed" data-toggle="collapse">Forex</div>
<div class="panel-collapse collapse">
<div class="table">
<div class="search">
<div class="date"</div>
<div class="group search">
<span>Search </span>
<input class="search-box" type="search"/>
</div>
<div class="group ">
<span class="label"></span>
<span class="toggle-a"> </span>
<span class="toggle-b"> </span>
</div>
</div>
<div class="skin">
<div class="table visible">
<div class="header">
<div>Product</div>
<div>Account A</div>
</div>
<div class="column-header">
<div class="column-name">NAME</div>
<div class="column-name">DESCRIPTION</div>
<div class="column-name">Value1</div>
<div class="column-name">Value2</div>
<div class="column-name">Value3</div>
<div class="column-name">Value3</div>
</div>
<div class="table-row">
<div class="table-cell c1">bronze</div>
<div class="table-cell c2">3rd tier</div>
<div class="table-cell c3">0</div>
<div class="table-cell c4">1</div>
<div class="table-cell c5">1</div>
<div class="table-cell c6">1</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
<div class="table-row">
<div class="table-cell c1">silver</div>
<div class="table-cell c2">2nd tier</div>
<div class="table-cell c3">1</div>
<div class="table-cell c4">0</div>
<div class="table-cell c5">3</div>
<div class="table-cell c6">0</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>

and what i want at the end:

| Product |             | Account A |         | Account B |         |
|---------|-------------|-----------|---------|-----------|---------|
| NAME    | DESCRIPTION | Value 1   | Value 2 | Value 3   | Value 4 |
| bronze  | 3rd tier    | 0         | 1       | 1         | 1       |
| silver  | 2nd tier    | 1         | 0       | 3         | 0       |

Is there a simple way using python or beautiful soup to do this?

1 Answer 1

4

Code to generate data from given html tags here i have parse your data as html

from bs4 import BeautifulSoup
rows=[]
soup=BeautifulSoup(html,"html.parser")
first_row=soup.find("div",attrs={"class":"column-header"}).text.strip("\n").split("\n")
for i in range(len((soup.select("div[class=table-row]")))):
    rows.append(soup.select("div[class=table-row]")[i].text.strip("\n").split("\n")[:6])

for Table Generation you can install BeautifulTable

from beautifultable import BeautifulTable
table = BeautifulTable()
table.column_headers = ["Product", "","Account A","","Account B",""]
table.append_row(first_row)
for i in rows:
    table.append_row(i)
print(table)

Output:

+---------+-------------+-----------+--------+-----------+--------+
| Product |             | Account A |        | Account B |        |
+---------+-------------+-----------+--------+-----------+--------+
|  NAME   | DESCRIPTION |  Value1   | Value2 |  Value3   | Value4 |
+---------+-------------+-----------+--------+-----------+--------+
| bronze  |  3rd tier   |     0     |   1    |     1     |   1    |
+---------+-------------+-----------+--------+-----------+--------+
| silver  |  2nd tier   |     1     |   0    |     3     |   0    |
+---------+-------------+-----------+--------+-----------+--------+

you can still modify tabular looking data by using tabulate library

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.