0

I'm trying to extract, with python, some javascript variables from an HTML site:

<script>
var nData = new Array();
var Data = "5b7b......";
nData = CallInit(Data);
...
...
</script>

I can see the content of "nData" in firebug (DOM Panel) without problem:

[Object { height="532",  width="1280",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}, Object { height="266",  width="640",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}]

The content of nData is an URL. How can i parse/extract the content of nData to python? It's possible?

Thanks

4
  • Can you give us a link to the site? Commented Apr 17, 2015 at 9:23
  • Do you have influence on the source code in a JS context before moving it to python? For example open webpage, insert a JS-write statement and save it as HTML. So you can write the variable as html first and then parse it via python. Commented Apr 17, 2015 at 9:24
  • If not you need kind of javascript runtime environment. May checkout the answers of stackoverflow.com/questions/2346584/… and stackoverflow.com/questions/2894946/…. Commented Apr 17, 2015 at 9:30
  • @wenzul no, i'm only trying to extract the url from the site, and use it in a python script. Commented Apr 17, 2015 at 9:33

1 Answer 1

3

With the help of the python library Ghost.py it should be possible to get a dynamic variable out of executed Javascript code.

I just tried it out with some small test site and got a Javascript variable named a which I use on that page as a python object. I did the following:

  1. Install Ghost.py with pip install Ghost.py.

  2. Install PySide (it's a prerequisite for Ghost.py) with pip install PySide.

  3. Use the following python code:

    from ghost import Ghost
    ghost = Ghost()
    ghost.open('https://dl.dropboxusercontent.com/u/13991899/test/index.html')
    js_variable, _ = ghost.evaluate('a', expect_loading=True)
    print js_variable
    

You should be able to get your variable nData into the python variable js_variable by opening your site with ghost.open and then call ghost.evaluate('nData').

Sign up to request clarification or add additional context in comments.

3 Comments

Cool, didn't know ghost. Just mechanize and stuff. For just retrieving the urls you could just look into CallInit and build your urls with python with Data as a parameter.
It's possible to do the same but using machanize?
Please update the ghost library based on its official website's information. I found the ghost class now only have ghost.start() in its newest version, and it is using sessions to manage the crawling.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.