Webscrape interactive chart in Python using beautiful soup with loops

Question

The below code provide information from all the numeric tags in the page. Can I use a filter to extract once for each region

For example : https://opensignal.com/reports/2019/04/uk/mobile-network-experience , I am interested in numbers only under the regional analysis tab and for all regions.

import requests
from bs4 import BeautifulSoup

html=requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup=BeautifulSoup(html,'html.parser')
items=soup.find_all('div',class_='c-ru-graph__rect')


for item in items:
    provider=item.find('span', class_='c-ru-graph__label').text
    prodvalue=item.find_next_sibling('span').find('span', class_='c-ru-graph__number').text
    print(provider + " : " + prodvalue)

I want a table or df as below Easter Region

                       o2      Vodaphone   3    EE
4G Availability        82      76.9        73.0   89.2
Upload Speed Experience 5.6    5.9         6.8    9.5

Any pointers that can help in getting the result ?

@Wonka , The problem is the structure of what is returned by the function. It greps every numeric information and not by region or the KPI's like 4G availability , Upload experience etc — EricA
– EricA, Commented May 10, 2019 at 17:23

QHarr · Accepted Answer · 2019-05-11 05:44:04Z

Here is how I would do it for all regions. Requires bs4 4.7.1. AFAICS you have to assume consistent ordering of companies.

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience")
soup = BeautifulSoup(r.content,'lxml') #'html.parser' if lxml not installed
metrics = ['4g-availability', 'video-experience', 'download-speed' , 'upload-speed', 'latency']
headers = ['02', 'Vodaphone', '3', 'EE']
results = []

for region in soup.select('.s-regional-analysis__region'):
    for metric in metrics:
        providers = [item.text for item in region.select('.c-ru-chart:has([data-metric="' + metric + '"]) .c-ru-graph__number')]
        row = {headers[i] : providers[i] for i in range(len(providers))}
        row['data-metric'] = metric
        row['region'] = region['id'] 
        results.append(row)

df = pd.DataFrame(results, columns = ['region', 'data-metric', '02','Vodaphone', '3', 'EE'] )
print(df)

Sample output:

sentence · Accepted Answer · 2019-05-10 17:48:30Z

1

Assuming fixed the order of companies (it is, indeed), you can simply reduce the content to examine to only those div's containing the information you need.

import requests
from bs4 import BeautifulSoup

html = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup = BeautifulSoup(html,'html.parser')

res = soup.find_all('div', {'id':'eastern'})

aval = res[0].find_all('div', {'data-chart-name':'4g-availability'})
avalname = aval[0].find('span', {'class':'js-metric-name'}).text

upload = res[0].find_all('div', {'data-chart-name':'upload-speed'})
uploadname = upload[0].find('span', {'class':'js-metric-name'}).text

companies = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__label')]

row1 = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__number')]
row2 = [i.text for i in upload[0].find_all('span', class_='c-ru-graph__number')]

import pandas as pd

df = pd.DataFrame({avalname:row1,
                   uploadname:row2})


df.index = companies

df = df.T

output

                          O2    Vodafone      3      EE
4G Availability         82.0        76.9   73.0    89.2
Upload Speed Experience  5.6         5.9    6.8     9.5

answered May 10, 2019 at 17:48

sentence

9,0314 gold badges36 silver badges41 bronze badges

1 Comment

EricA Over a year ago

this helps but the below is automated for collecting for all regions,Thank you

Collectives™ on Stack Overflow

Webscrape interactive chart in Python using beautiful soup with loops

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related