I am pretty new at Python and struggling with printing the web scraping data to beautiful excel table. Here is a table I am trying to scrape and replicate in Python: HTML Table.
Here is how HTML page looks like:
</div>
<section id="first" style="display:none" aria-label="Power situation graph section">
<div class="gridModule-2up">
<div class="prognos_controls hidden" data-proggraph="1">
Show data for:
<button value="1" onclick="this.blur();" type="button" class="btn btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Yesterday</button>
<button value="2" onclick="this.blur();" type="button" class="btn btn--tertiary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Today</button>
<button value="3" onclick="this.blur();" type="button" class="btn btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Tomorrow</button>
</div>
<table summary="Consumption" id="prognos_datatable_total" class="prognos_datatable scrollable">
<thead>
<tr>
<th data-sheets-numberformat="[null,1]"></th>
<th data-sheets-value="[null,2,'17/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-17</th>
<th data-sheets-numberformat="[null,1]"></th>
<th data-sheets-value="[null,2,'18/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-18</th>
<th data-sheets-numberformat="[null,1]"></th>
<th data-sheets-value="[null,2,'19/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-19</th>
</tr>
<tr>
<th caldata-sheets-value="[null,2,'Timme']" data-sheets-numberformat="[null,1]" scope="col">Hour</th>
<th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
<th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
<th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
<th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
<th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
<th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
</tr>
</thead>
<tbody>
<tr>
<th data-sheets-value="[null,2,'00-01']" data-sheets-numberformat="[null,1]" scope="col">
00-01
</th>
<td data-sheets-value="[null,2,'15544']" data-sheets-numberformat="[null,1]">15 544</td>
<td class="alert_1" data-sheets-value="[null,2,'15143']" data-sheets-numberformat="[null,1]">15 143</td>
<td data-sheets-value="[null,2,'15669']" data-sheets-numberformat="[null,1]">15 669</td>
<td class="alert_1" data-sheets-value="[null,2,'15869']" data-sheets-numberformat="[null,1]">15 869</td>
<td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
<td class="alert_1" data-sheets-value="[null,2,'16422']" data-sheets-numberformat="[null,1]">16 422</td>
</tr>
<tr>
<th data-sheets-value="[null,2,'01-02']" data-sheets-numberformat="[null,1]" scope="col">
01-02
</th>
<td data-sheets-value="[null,2,'15238']" data-sheets-numberformat="[null,1]">15 238</td>
<td class="alert_1" data-sheets-value="[null,2,'15052']" data-sheets-numberformat="[null,1]">15 052</td>
<td data-sheets-value="[null,2,'15509']" data-sheets-numberformat="[null,1]">15 509</td>
<td class="alert_1" data-sheets-value="[null,2,'15366']" data-sheets-numberformat="[null,1]">15 366</td>
<td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
<td class="alert_1" data-sheets-value="[null,2,'16176']" data-sheets-numberformat="[null,1]">16 176</td>
</tr>
<tr>
<th data-sheets-value="[null,2,'02-03']" data-sheets-numberformat="[null,1]" scope="col">
02-03
</th>
<td data-sheets-value="[null,2,'15250']" data-sheets-numberformat="[null,1]">15 250</td>
<td class="alert_1" data-sheets-value="[null,2,'15135']" data-sheets-numberformat="[null,1]">15 135</td>
<td data-sheets-value="[null,2,'15576']" data-sheets-numberformat="[null,1]">15 576</td>
<td class="alert_1" data-sheets-value="[null,2,'15501']" data-sheets-numberformat="[null,1]">15 501</td>
<td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
<td class="alert_1" data-sheets-value="[null,2,'16124']" data-sheets-numberformat="[null,1]">16 124</td>
</tr>
<tr>
<th data-sheets-value="[null,2,'03-04']" data-sheets-numberformat="[null,1]" scope="col">
03-04
</th>.............
Here is the code I used:
import requests
import lxml.html as lh
import pandas as pd
from bs4 import BeautifulSoup
import csv
url = 'myURLlink'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
extract = soup.select("table")[1]
table = [[item.text for item in row_data.select("th,td")]
for row_data in extract.select("tr")]
for item in table:
print(' '.join(item))
This is how my output looks with this code: Output.
How can I create a normal data frame from this that I can then export to Excel?
I would appreciate any help.