1

As far as I can tell, this is the case for LyricWikia. The lyrics (example) can be accessed from the browser, but can't be found in the source code (can be opened with CTRL + U in most browsers) or reading the contents of the site with Python:

from urllib.request import urlopen

URL = 'http://lyrics.wikia.com/Billy_Joel:Piano_Man'

r = urlopen(URL).read().decode('utf-8')

And the test:

>>> 'Now John at the bar is a friend of mine' in r
False
>>> 'John' in r
False

But when you select and look at the source code of the box in which the lyrics are displayed, you can see that there is: <div class="lyricbox">[...]</div>

Is there a way to get the contents of that div-element with Python?

1 Answer 1

2

You can try Ghost.py, which is essentially Phantom.js for Python. It embeds WebKit and is thus able to execute the JavaScript on the page as if you had navigated to the page manually. It then gives you access to the DOM structure.

Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I've been looking for, coincidentally. Answered my question before I even asked it :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.