0

I am pretty new at Python and struggling with printing the web scraping data to beautiful excel table. Here is a table I am trying to scrape and replicate in Python: HTML Table.

Here is how HTML page looks like:

</div>
    <section id="first" style="display:none" aria-label="Power situation graph section">
        <div class="gridModule-2up">
            <div class="prognos_controls hidden" data-proggraph="1">
                Show data for:
                <button value="1" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Yesterday</button>
                <button value="2" onclick="this.blur();" type="button" class="btn  btn--tertiary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Today</button>
                <button value="3" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Tomorrow</button>
            </div>
            <table summary="Consumption" id="prognos_datatable_total" class="prognos_datatable scrollable">
                <thead>
                    <tr>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'17/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-17</th>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'18/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-18</th>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'19/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-19</th>

                    </tr>
                    <tr>
                        <th caldata-sheets-value="[null,2,'Timme']" data-sheets-numberformat="[null,1]" scope="col">Hour</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>

                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <th data-sheets-value="[null,2,'00-01']" data-sheets-numberformat="[null,1]" scope="col">
                            00-01
                        </th>

                            <td data-sheets-value="[null,2,'15544']" data-sheets-numberformat="[null,1]">15&#160;544</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15143']" data-sheets-numberformat="[null,1]">15&#160;143</td>
                            <td data-sheets-value="[null,2,'15669']" data-sheets-numberformat="[null,1]">15&#160;669</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15869']" data-sheets-numberformat="[null,1]">15&#160;869</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16422']" data-sheets-numberformat="[null,1]">16&#160;422</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'01-02']" data-sheets-numberformat="[null,1]" scope="col">
                            01-02
                        </th>

                            <td data-sheets-value="[null,2,'15238']" data-sheets-numberformat="[null,1]">15&#160;238</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15052']" data-sheets-numberformat="[null,1]">15&#160;052</td>
                            <td data-sheets-value="[null,2,'15509']" data-sheets-numberformat="[null,1]">15&#160;509</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15366']" data-sheets-numberformat="[null,1]">15&#160;366</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16176']" data-sheets-numberformat="[null,1]">16&#160;176</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'02-03']" data-sheets-numberformat="[null,1]" scope="col">
                            02-03
                        </th>

                            <td data-sheets-value="[null,2,'15250']" data-sheets-numberformat="[null,1]">15&#160;250</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15135']" data-sheets-numberformat="[null,1]">15&#160;135</td>
                            <td data-sheets-value="[null,2,'15576']" data-sheets-numberformat="[null,1]">15&#160;576</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15501']" data-sheets-numberformat="[null,1]">15&#160;501</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16124']" data-sheets-numberformat="[null,1]">16&#160;124</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'03-04']" data-sheets-numberformat="[null,1]" scope="col">
                            03-04
                        </th>.............

Here is the code I used:

import requests
import lxml.html as lh
import pandas as pd
from bs4 import BeautifulSoup
import csv

url = 'myURLlink'

response = requests.get(url) 

soup = BeautifulSoup(response.text, 'lxml')

extract = soup.select("table")[1]

table = [[item.text for item in row_data.select("th,td")]
                for row_data in extract.select("tr")]

for item in table:
    print(' '.join(item))

This is how my output looks with this code: Output.

How can I create a normal data frame from this that I can then export to Excel?

I would appreciate any help.

0

2 Answers 2

1

The issue is because of escape characters.

from bs4 import BeautifulSoup

with open("sample.html", "r") as f:

    contents = f.read()
    soup = BeautifulSoup(contents, 'lxml')
    extract = soup.find("table")

    # added strip() to remove leading and trailing characters
    table = [[item.text.strip() for item in row_data.select("th,td")]
                    for row_data in extract.select("tr")]

    for item in table:
        print(' '.join(item))

Check output here

Sign up to request clarification or add additional context in comments.

4 Comments

You could remove the useless imports (csv, lxml.html, requests, pandas) from your code snippet. Along with commented lines that do not provide value.
Thank you Kunal! It helped a lot. Is it possible to create a data frame out of this result?
Katya, yes it is possible. Just convert list to dataframe :- df = pd.DataFrame(table)
and yes than df.to_csv("demo.csv", index=False)
1

Try going with pandas here. It uses beautifulsoup under the hood. I can't test it on your URL since you havent provided one.

import pandas as pd

url = 'myURLlink'
df = pd.read_html(url)[1]

df.to_csv('file.csv', index=False)
print (df.to_string())

2 Comments

Hi! Thank you for your reply. I get a CSV file with some weird symbols inside. Maybe you could take a look? The url is: svk.se/en/national-grid/the-control-room . Thank you!
@Katya, change line to: df.to_csv('file.csv', encoding='utf-8-sig', index=False)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.