Exploring an attribute in Python

Question

I am parsing html from the following website: http://www.asusparts.eu/partfinder/Asus/All In One/E Series I was just wondering if there was any way i could explore a parsed attribute in python? For example.. The code below outputs the following:

datas = s.find(id='accordion')

    a = datas.findAll('a')

    for data in a:

            if(data.has_attr('onclick')):
                model_info.append(data['onclick'])
                print data

[OUTPUT]

<a href="#Bracket" onclick="getProductsBasedOnCategoryID('Asus','Bracket','ET10B','7138', this, 'E Series')">Bracket</a>

These are the values i would like to retrieve:

nCategoryID = Bracket

nModelID = ET10B

family = E Series

As the page is rendered from AJAX, They are using a script source resulting in the following url from the script file:

url = 'http://json.zandparts.com/api/category/GetCategories/' + country + '/' + currency + '/' + nModelID + '/' + family + '/' + nCategoryID + '/' + brandName + '/' + null

How can i retrieve only the 3 values listed above?

[EDIT]

import string, urllib2, urlparse, csv, sys
from urllib import quote
from urlparse import urljoin
from bs4 import BeautifulSoup
from ast import literal_eval

changable_url = 'http://www.asusparts.eu/partfinder/Asus/All%20In%20One/E%20Series'
page = urllib2.urlopen(changable_url)
base_url = 'http://www.asusparts.eu'
soup = BeautifulSoup(page)

#Array to hold all options
redirects = []
#Array to hold all data
model_info = []

print "FETCHING OPTIONS"
select = soup.find(id='myselectListModel')
#print select.get_text()


options = select.findAll('option')

for option in options:
    if(option.has_attr('redirectvalue')):
       redirects.append(option['redirectvalue'])

for r in redirects:
    rpage = urllib2.urlopen(urljoin(base_url, quote(r)))
    s = BeautifulSoup(rpage)
    #print s



    print "FETCHING MAIN TITLE"
    #Finding all the headings for each specific Model
    maintitle = s.find(id='puffBreadCrumbs')
    print maintitle.get_text()

    #Find entire HTML container holding all data, rendered by AJAX
    datas = s.find(id='accordion')

    #Find all 'a' tags inside data container
    a = datas.findAll('a')

    #Find all 'span' tags inside data container
    content = datas.findAll('span')

    print "FETCHING CATEGORY" 

    #Find all 'a' tags which have an attribute of 'onclick' Error:(doesn't display anything, can't seem to find
    #'onclick' attr
    if(hasattr(a, 'onclick')):
        arguments = literal_eval('(' + a['onclick'].replace(', this', '').split('(', 1)[1])
        model_info.append(arguments)
        print arguments #arguments[1] + " " + arguments[3] + " " + arguments[4] 


    print "FETCHING DATA"
    for complete in content:
        #Find all 'class' attributes inside 'span' tags
        if(complete.has_attr('class')):
            model_info.append(complete['class'])

            print complete.get_text()

    #Find all 'table data cells' inside table held in data container       
    print "FETCHING IMAGES"
    img = s.find('td')

    #Find all 'img' tags held inside these 'td' cells and print out
    images = img.findAll('img')
    print images

I have added an Error line where the problem lays...

Jon Clements · Accepted Answer · 2013-04-22 13:01:30Z

1

Similar to Martijn's answer, but makes primitive use of pyparsing (ie, it could be refined to recognise the function and only take quoted strings with the parentheses):

from bs4 import BeautifulSoup
from pyparsing import QuotedString
from itertools import chain

s = '''<a href="#Bracket" onclick="getProductsBasedOnCategoryID('Asus','Bracket','ET10B','7138', this, 'E Series')">Bracket</a>'''
soup = BeautifulSoup(s)
for a in soup('a', onclick=True):
    print list(chain.from_iterable(QuotedString("'", unquoteResults=True).searchString(a['onclick'])))
# ['Asus', 'Bracket', 'ET10B', '7138', 'E Series']

answered Apr 22, 2013 at 13:01

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Martijn Pieters · Accepted Answer · 2013-04-22 14:50:44Z

1

You could parse that as a Python literal, if you remove the this, part from it, and only take everything between the parenthesis:

from ast import literal_eval

if data.has_attr('onclick'):
    arguments = literal_eval('(' + data['onclick'].replace(', this', '').split('(', 1)[1])
    model_info.append(arguments)
    print arguments

We remove the this argument because it is not a valid python string literal and you don't want to have it anyway.

Demo:

>>> literal_eval('(' + "getProductsBasedOnCategoryID('Asus','Bracket','ET10B','7138', this, 'E Series')".replace(', this', '').split('(', 1)[1])
('Asus', 'Bracket', 'ET10B', '7138', 'E Series')

Now you have a Python tuple and can pick out any value you like.

You want the values at indices 1, 2 and 4, for example:

nCategoryID, nModelID, family = arguments[1], arguments[3], arguments[4]

edited Apr 22, 2013 at 14:50

answered Apr 22, 2013 at 12:49

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

10 Comments

Martijn Pieters Over a year ago

You are still just printing data, not what you extracted from the onclick attribute.

Martijn Pieters Over a year ago

Print arguments instead; the return value of literal_eval.

ash Over a year ago

I am trying to understand what you have done here but there is no success.. I only want to display ('Bracket', 'ET108' 'E Series') but i get the following error shown in the edit above

Martijn Pieters Over a year ago

@ash: Don't mess with the string, just use the code I gave you; just grab what you need from arguments. I've added an example.

ash Over a year ago

Yes, i tried that initially but it threw an error saying tuple index out of range

|

Collectives™ on Stack Overflow

Exploring an attribute in Python

2 Answers 2

Comments

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related