1

I need to parse

http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0

If you view source of the above url you will find

Expected Output:

fvRequests= css
fvRequests=7
1
  • fvRequests.setValue(0, 0, 'css') fvRequests.setValue(0, 1, 7) fvBytes.setValue(0, 0, 'css') fvBytes.setValue(0, 1, 110557) Commented Apr 5, 2015 at 6:15

2 Answers 2

1
import re
import urllib2



if __name__ == "__main__":
    url = 'http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0'

    # http request
    response = urllib2.urlopen(url)
    html = response.read()
    response.close()

    # finding values in html
    results = re.findall(r'fvRequests\.setValue\(\d+, \d+, \'?(.*?)\'?\);', html)
    keys = results[::2]
    values = results[1::2]

    # creating a dictionary
    output = dict(zip(keys, values))

    print output
Sign up to request clarification or add additional context in comments.

Comments

1

The idea is to locate the script with BeautifulSoup and use regular expression pattern to find the the fvRequests.setValue() calls and extract the value of the third argument:

import re

from bs4 import BeautifulSoup
import requests


pattern = re.compile(r"fvRequests\.setValue\(\d+, \d+, '?(\w+)'?\);")

response = requests.get("http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0")
soup = BeautifulSoup(response.content)

script = soup.find("script", text=lambda x: x and "fvRequests.setValue" in x).text
print(re.findall(pattern, script))

Prints:

[u'css', u'7', u'flash', u'0', u'font', u'0', u'html', u'14', u'image', u'80', u'js', u'35', u'other', u'14']

You can go further and pack the list into a dict (solution taken from here):

dict(zip(*([iter(data)] * 2)))

which would produce:

{
    'image': '80', 
    'flash': '0', 
    'js': '35', 
    'html': '14',  
    'font': '0', 
    'other': '14', 
    'css': '7'
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.