1
ghost = Ghost()
page, rcs = ghost.open(https://soundcloud.com/passionpit/sets/favorites)
page, rcs = ghost.wait_for_page_loaded()
songs = ghost.evaluate("document.getElementsByClassName('soundTitle__title');")
print songs

I am attempting to use the above code to find all html elements on the above page that have the class 'soundTitle__title' however as of right now my output is

QFont::setPixelSize: Pixel size <= 0 (0)
({PyQt4.QtCore.QString(u'length'): 0.0}, [])

Can anyone help me see where my problem is? When I run document.getElementsByClassName('soundTitle__title') in my browsers console I get the output I expect, why is the Python output different?

Or is there some way for me to use Ghost.py or another similar library to get the source of the page after the JavaScript has run (the source seen when inspecting an element with browser developer tools)?

8
  • I'm sorry, but I just have to do this with lxml.html. from lxml import html html.parse("https://soundcloud.com/passionpit/sets/favorites").getroot().cssselect(".soundTitle__title") Commented Jun 24, 2014 at 22:12
  • I tried running your code and ran into some issues. My output is IOError: Error reading file 'https://soundcloud.com/passionpit/sets/favorites': failed to load external entity "https://soundcloud.com/passionpit/sets/favorites" Commented Jun 24, 2014 at 22:20
  • TUrns out html.parse cannot load https. Commented Jun 24, 2014 at 22:31
  • There is SoundCloud API - something for developers. Commented Jun 25, 2014 at 1:35
  • There is even Python wrapper around the SoundCloud API Commented Jun 25, 2014 at 1:37

1 Answer 1

4

I got this working, and would recommend, using Splinter, which is basically just running phantomjs and selenium under the hood.

You'll need to run pip install splinter and also install phantomjs on your machine, either by downloading/untarring or npm -g install phantomjs if you have npm, etc. But overall the installation and dependencies are minimal and straightforward.

The following code returns 'Ryn Weaver - OctaHate', which I'm assuming is what you're looking for, although without more context I can't be totally sure.

from splinter import Browser

browser = Browser('phantomjs')
browser.visit('https://soundcloud.com/passionpit/sets/favorites')
songs = browser.find_by_xpath("//a[contains(@class, 'soundTitle__title')]")
if songs:
    for song in songs:
        print song.text
else:
    print "there aren't any songs"

You'll also notice that I had to do an xpath-contains to get the class description that you were looking for; so, you might be running into a problem when trying to access that class by the notation you were using - there is a span element and also an anchor element that both contain 'soundTitle__title' but as far as I could tell, only the 'a' element had text and I would guess that's what you're looking for. But if you want both you could just do browser.find_by_xpath("//*[contains(@class, 'soundTitle__title')]")

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.