20

I am trying to use Selenium in Python to save webpages on MacOS Firefox.

So far, I have managed to click COMMAND + S to pop up the SAVE AS window. However,

I don't know how to:

  1. change the directory of the file,
  2. change the name of the file, and
  3. click the SAVE AS button.

Could someone help?

Below is the code I have use to click COMMAND + S:

ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()

Besides, the reason for me to use this method is that I encounter Unicode Encode Error when I :-

  1. write the page_source to a html file and
  2. store scrapped information to a csv file.

Write to a html file:

file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close() 

Write to a csv file:

csv_file_write.writerow(to_write)

Error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

1
  • I end up not using the SAVE AS method and to solve the html-file and csv-file writing problems, I used codecs and unicodecsv. Refer to RemcoW's comment and this post stackoverflow.com/questions/18766955/… for details. Commented Jun 15, 2016 at 13:40

4 Answers 4

29
with open('page.html', 'w') as f:
    f.write(driver.page_source)
Sign up to request clarification or add additional context in comments.

2 Comments

Note that driver.page_source can crash with pages larger than 200MB in most webdrivers. For huge pages, using ActionChains is more reliable.
On Python 2 with unicode in the page source you might need: driver.page_source.encode('utf-8').
9

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.

The closes thing you could do is collect the page_source which gives you the entire HTML of a single page and save this to a file.

import codecs

completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)

If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

6 Comments

Thank you! I am aware of this method. However, for my webpages contain characters that prompt Unicode Encode Errors, I need to save the webpages in its original format to avoid loosing important information. An example of the Unicode Encode Errors is ... 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128).
@TommyN When are you getting this error? When trying to write the page_source to the file?
Yes, it happens when I try to write the page_source to a html file. Would you know if there are any solutions for me to minimize the amount of information lost in regards to those special characters? (I intentionally don't want to use ignore)
@RemcoW Would you think that I can use codecs for writing to a csv file as well?
@TommyN Take a look at this question for that: stackoverflow.com/questions/18766955/…
|
5

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:

page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()

1 Comment

Getting the HTML can also be accomplished by using driver.page_source. This spares the need for finding the html element an getting its outerHTML manually.
2

This is a complete, working example of the answer RemcoW provided:

You first have to install a webdriver, e.g. pip install selenium chromedriver_installer.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# core modules
import codecs
import os

# 3rd party modules
from selenium import webdriver


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/usr/local/bin/chromedriver')
    browser = webdriver.Chrome(executable_path=path_to_chromedriver)
    return browser


save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()

url = "https://martin-thoma.com/"
browser.get(url)

complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.