1

I'm looking for a package/way to automatize web browsing. For example, I have these results of the search (sorry for Russian): http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998

I want to retrieve a value of the variable “item.n” (line 399) from python? It looks like it’s an internal variable of the Javascript function “onSearchLoaded” but if you put the mouse pointer on the result of the search you will see that n=160111 - that’s the value of item.n I’m trying to get What are the packages in python that could help me to do that?

1 Answer 1

2

You don't have to extract the javascript variable itself, just where it uses that variable. In this case it is placed in the href of the results back from the search.

There a bunch of different libraries you can use for automation. It depends on the level of automation you wish to see. In my case, I prefer to use selenium for these types of automation. Couple that with the core python module regex and you can create a basic example. I was able to write a quick mockup using selenium:

from selenium import webdriver
import re

url = "http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998"
pattern = re.compile("n=(\d+)")
xpath = '//div[@id = "baseSrch"]//a'

browser = webdriver.Firefox()
page = browser.get(url)
elements = browser.find_elements_by_xpath(xpath)
browser.close()

for element in elements:
    match = re.search(pattern, element.get_attribute("href"))
    if match:
        print match.group(1)

Which yields:

160111

However this isn't the only way, you could also substitute this with urllib, requests, lxml, etc.. There are a bunch of different methods with which you can extract the information.

Sign up to request clarification or add additional context in comments.

2 Comments

And do you know if I can extract the text which tags contain. For example, the phrase “Утратил силу” from the line 417 of the source of base.consultant.ru/cons/cgi/online.cgi?req=doc;base=LAW;n=72596 As far as I understand, to access the dtitle div tag (line 405) I need to do something like: url_to_doc = "http://base.consultant.ru/cons/cgi/online.cgi?req=doc;base=LAW;n=72596" xpathDoc = '//div[@id = "dtitle"]' browser = webdriver.Firefox() page = browser.get(url_to_doc) elements = browser.find_elements_by_xpath(xpathDoc) but I don't see how to see the text in the elements instance...
Assuming you have the right xpath (I haven't checked) then you just need to loop through the elements and call .text on each of the elements. This will return any inner text in-between the tags. Note though that you will probably have to use .encode() if you want to print it out

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.