automatization of web browsing with JavaScripts under python

Question

I'm looking for a package/way to automatize web browsing. For example, I have these results of the search (sorry for Russian): http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998

I want to retrieve a value of the variable “item.n” (line 399) from python? It looks like it’s an internal variable of the Javascript function “onSearchLoaded” but if you put the mouse pointer on the result of the search you will see that n=160111 - that’s the value of item.n I’m trying to get What are the packages in python that could help me to do that?

user3960432 · Accepted Answer · 2014-08-21 13:37:20Z

2

You don't have to extract the javascript variable itself, just where it uses that variable. In this case it is placed in the href of the results back from the search.

There a bunch of different libraries you can use for automation. It depends on the level of automation you wish to see. In my case, I prefer to use selenium for these types of automation. Couple that with the core python module regex and you can create a basic example. I was able to write a quick mockup using selenium:

from selenium import webdriver
import re

url = "http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998"
pattern = re.compile("n=(\d+)")
xpath = '//div[@id = "baseSrch"]//a'

browser = webdriver.Firefox()
page = browser.get(url)
elements = browser.find_elements_by_xpath(xpath)
browser.close()

for element in elements:
    match = re.search(pattern, element.get_attribute("href"))
    if match:
        print match.group(1)

Which yields:

However this isn't the only way, you could also substitute this with urllib, requests, lxml, etc.. There are a bunch of different methods with which you can extract the information.

answered Aug 21, 2014 at 13:37

user3960432

Sign up to request clarification or add additional context in comments.

2 Comments

user2598356 Over a year ago

And do you know if I can extract the text which tags contain. For example, the phrase “Утратил силу” from the line 417 of the source of base.consultant.ru/cons/cgi/online.cgi?req=doc;base=LAW;n=72596 As far as I understand, to access the dtitle div tag (line 405) I need to do something like:

url_to_doc = "http://base.consultant.ru/cons/cgi/online.cgi?req=doc;base=LAW;n=72596" xpathDoc = '//div[@id = "dtitle"]' browser = webdriver.Firefox() page = browser.get(url_to_doc) elements = browser.find_elements_by_xpath(xpathDoc)

but I don't see how to see the text in the elements instance...

user3960432 Over a year ago

Assuming you have the right xpath (I haven't checked) then you just need to loop through the elements and call .text on each of the elements. This will return any inner text in-between the tags. Note though that you will probably have to use .encode() if you want to print it out

Collectives™ on Stack Overflow

automatization of web browsing with JavaScripts under python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related