2

I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names. Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.

To begin I was following this tutorial. I have completed the tutorial but ran into this error from Lambda:

START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 46, in lambda_handler
    driver = webdriver.Chrome(chrome_options=chrome_options)
  File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
    self.service.start()
  File "/var/task/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

This error was experienced by others and was "resolved" by the author by linking to this stack overflow page. I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.

A couple of changes Ive tried to no avail.

1) Changing the chromedriver and headless-chromium to .exe files
2) Changing this line of code to include the executable_path

driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")

Any help in getting selenium and aws lambda working together would be greatly appreciated.

2
  • Have you added the downloaded chromium files as a part of your deployment package and if so, try changing the path of your driver command to something like this: driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "./chromedriver.exe") Commented May 11, 2019 at 16:03
  • Sorry for late response but I tried using the "./" and am still receiving the same error Commented May 14, 2019 at 22:05

2 Answers 2

3

I had the same issue and it was due to the binary files being in a location that couldn't execute them. Adding a function to move them, then reading them from that location fixed it. See below example which I just got working while researching this error. (Apologies for the messy code.)

import time
import os
from selenium import webdriver
from fake_useragent import UserAgent

import subprocess
import shutil
import time

BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"

def _init_bin(executable_name):
    start = time.clock()
    if not os.path.exists(BIN_DIR):
        print("Creating bin folder")
        os.makedirs(BIN_DIR)
    print("Copying binaries for " + executable_name + " in /tmp/bin")
    currfile = os.path.join(CURR_BIN_DIR, executable_name)
    newfile = os.path.join(BIN_DIR, executable_name)
    shutil.copy2(currfile, newfile)
    print("Giving new binaries permissions for lambda")
    os.chmod(newfile, 0o775)
    elapsed = time.clock() - start
    print(executable_name + " ready in " + str(elapsed) + "s.")

def handler(event, context):

    _init_bin("headless-chromium")
    _init_bin("chromedriver")

    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--window-size=1280x1696')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--ignore-certificate-errors')

    chrome_options.binary_location = "/tmp/bin/headless-chromium"
    driver = webdriver.Chrome("/tmp/bin/chromedriver", chrome_options=chrome_options)
    driver.get('https://en.wikipedia.org/wiki/Special:Random')
    line = driver.find_element_by_class_name('firstHeading').text
    print(line)
    driver.quit()

    return line

Sign up to request clarification or add additional context in comments.

5 Comments

I'm trying out your solution, but I am still getting a "Message: Can not connect to the Service /tmp/bin/chromedriver" error from selenium. Have you run across this before?What versions of selenium, chromedriver, and headless-chromium are you using?
I don't recall that error. I'm using the binary and chrome driver that are included if you clone and download github.com/ryfeus/lambda-packs/tree/master/Selenium_Chromium/…
I realized I hadn't changed my PYTHONPATH, I think that solved the problem for me.
I followed your solution. However, I am getting this "errorMessage": "[Errno 2] No such file or directory: '/var/task/bin/headless-chromium'", after. Do you happen to know the reason? I am running it on AWS lambda, it seems the bin dir doesn't have the executable I am trying to copy to /tmp
Yes it does seem like that. Id highly recommend using the AWS SAM to let you get into a lambda like environment on your desktop via docker. It will let you poke around and make sure you have permissions and that the files does exist. Makes deployment much easier too.
0

I also had the same issue but I have fixed it now. In my case it was the python version was not same on lambda and My Dockerfile.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.