Parsing .js page python

Question

I have a webpage http://timetable.ait.ie/js/filter.js and I seriously need to parse this page. I have been using BeautifulSoup over the past few days to parse html pages and I really get what I am doing there but this .js file is killing me.

At the moment I am using the following code:

import urllib
page = urllib.urlopen("http://timetable.ait.ie/js/filter.js")
pageInfo = page.read()

and it is returning a string with the whole file of 18283 lines of code. In the code I am trying to get the staff names towards the bottom, there is an array:

staffarray[373][0] = "BRADY, DAMIEN";
staffarray[373][1] = "SCI";
staffarray[373][2] = "BRADY001608";

I need the value from [0] and from [1] and then build a database with these values that I can reference later.

I have tried regex to find the staffarray but I am completely frustrated trying to get this information. Is there anyone that can help me please.

urllib and requests reads only data from serve. BS lets you find tag in HTML - ie tag <script> with JavaScript code. But you need standard string function or regex to find anything in JavaString code. — furas
– furas, Commented Nov 12, 2016 at 1:20
if you have problem with regex then use standard string function - first split file into lines and then find line with staffarray[ - ie if "staffarray[" in line:. And then you can easily find interesting values using other string functions or slicing. — furas
– furas, Commented Nov 12, 2016 at 1:23

furas · Accepted Answer · 2016-11-12 01:34:05Z

1

If you have problem with regex then use standard string functions and slicing.

First split code into lines and later search staffarray[ and [0] or [1]. Lastly use slicing.

import urllib

req = urllib.urlopen("http://timetable.ait.ie/js/filter.js")
lines = req.read().split('\n')

for x in lines:
    if 'staffarray[' in x:
        if '[0] = ' in x:
            start = x.find('"')+1
            end = -3
            print '0', x[start:end]
        elif '[1] = ' in x:
            start = x.find('"')+1
            end = -3
            print '1', x[start:end]

answered Nov 12, 2016 at 1:34

furas

149k12 gold badges121 silver badges171 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Matthew Swart Over a year ago

Thanks a million for the help @furas

Stonecold · Accepted Answer · 2016-11-12 01:44:16Z

1

You could write a regexp pattern with capturing groups:

import re
with open('filter.js') as file:
    pattern = r'staffarray\[(?P<first_index>\d+)\]\s*\[(?P<second_index>\d+)\] = "(?P<name>.+)"'
    for line in file:
        match = re.search(pattern, line)
        if match:
            first_index, second_index, name = match.groups()
            # do something with data

answered Nov 12, 2016 at 1:44

Stonecold

4281 gold badge3 silver badges11 bronze badges

1 Comment

Matthew Swart Over a year ago

Thanks for the answer got sorted a while back.

Collectives™ on Stack Overflow

Parsing .js page python

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related