2

I am trying to get my script to download subtitles from www.subscene.com. The problem is that the download button on webpage is made in java, and for some reason i cannot download subtitles even if i extract the URL.

I think this is the code for the download button:

<a id="s_lc_bcr_downloadLink" class="downloadLink rating0" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;s$lc$bcr$downloadLink&quot;, &quot;&quot;, true, &quot;&quot;, &quot;/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx&quot;, false, true))">Download English Subtitle</a><a id="s_lc_bcr_previewLink" href="javascript:togglePreview(482407, 'zip');">(See preview)</a>

so i extract the url and tell my script to download it:

urllib.urlretrieve('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx','c:\\sub.zip')

(Added 'http://subscene.com')

But for some reason it doesnt download the right file. What am i supposed to do?

EDIT:

Thanks a lot! unfortunately i cant get it to work :( it says the following

from selenium import webdriver

browser = webdriver.Firefox()
browser.execute_script('WebForm_DoPostBackWithOptions(newWebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')

Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 385, in execute_script{'script': script, 'args':converted_args})['value']
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 153, in execute
self.error_handler.check_response(response)
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\errorhandler.py", line 126, in check_response
raise exception_class(message, screen, stacktrace) 
WebDriverException: Message: ''
2
  • What your trying to download (zip.zipx) is not the file, thats some javascript. i am looking into how to get the url of the download. Commented Nov 27, 2011 at 19:32
  • This is going to be hard to find the actual url of each file. It seems everything is retrieved from the server via javascript. Which I don't think makes a url other then maybe the local directory which you would have to take a good look at the sites javascript and how it handles these files. I noticed something alot the lines of http://subscene.com/downloadissue.aspx?subtitleId=482407&contentType=zip which means it finds subtitleId and then ensures contentType of zip and just grabs it from there. Which is probably organised with a form of SQL. Commented Nov 27, 2011 at 19:38

1 Answer 1

4

As John said this is not the file but javascript code. So instead of getting that file using urllib.urlretrieve, you can execute the javascript which downloads the files in turn. This can be done using selenium module -

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407.aspx')        
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')
raw_input()

I got this javascript snippet using Firebug.

Sign up to request clarification or add additional context in comments.

5 Comments

very nice @theharshest I guess you could achieve similar results also with mechanize python library - but this is elegant enough. but doesn't it require you to also install the selenium java server etc?
@alonisser thanks and yes you need to install selenium module for Python. Downloading modules is very simple with PIP.
Well glad someone was able to help him out +1.
Thanks a lot, please check my next post :)
@theharshest: I added some code to "make it work". Hope you don't mind. Please correct if I did it improperly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.