1

I have a selenium parser that needs to be run in docker. When run on a local machine, the script works completely correctly. When running inside a container, it feels like selenium is not working, when searching any elements, I get an error that the element will not be found. Thus, I conclude that selenium does not run inside the docker, or it cannot integrate with the chrome browser. I tried installing chrome browser, chrome driver inside container. Tried using a remote driver running inside another container. The result is always the same. The highest priority is to run without using a remote driver. Looking forward to your advice, thanks everyone!

My Dockerfile:

FROM python:3.10-slim-buster

RUN mkdir -p /usr/src/app/
WORKDIR /usr/src/app/

COPY . /usr/src/app/

RUN pip install --no-cache-dir -r requirements.txt

RUN  apt-get update \
  && apt-get install -y wget \
  && apt-get install -y gnupg2 \
  && apt-get install -y curl \
  && rm -rf /var/lib/apt/lists/*

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable

# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/src/app/chromedriver/


CMD python3 ./script.py

My Python script:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(r'chromedriver/chromedriver')
options = webdriver.ChromeOptions()

options.add_argument('--no-sandbox')

# webdriver mode
options.add_argument('--disable-blink-features=AutomationControlled')
# user-agent
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                     "Chrome/103.0.0.0 Safari/537.36")

options.add_argument('--window-size=1420,1080')
# headless mode
options.add_argument('--headless')
options.add_argument('--disable-gpu')
# incognito mode
options.add_argument("--incognito")

options.add_experimental_option("excludeSwitches", ["enable-logging"])

driver = webdriver.Chrome(
    service=service,
    options=options
)

5
  • 1
    Could you provide your requirements.txt as well? Commented Jul 28, 2022 at 11:31
  • Why don't you just use a docker image provided by selenium? It contains all the features neccessary to run your tests. hub.docker.com/u/selenium Commented Jul 28, 2022 at 11:35
  • @MartinTovmassian requirements.txt --> selenium==4.3.0 beautifulsoup4==4.11.1 lxml==4.9.1 requests==2.28.1 uvicorn==0.18.2 fastapi==0.79.0 celery==5.2.7 flower==1.1.0 fake_user_agent==0.0.15 chromedriver-binary==103.0.5060.134.0 Commented Jul 28, 2022 at 12:17
  • @Tork When I use selenium/standalone-chrome I get the following - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=4444): Max retries exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5d29a9eaa0>: Failed to establish a new connection: [Errno 111] Connection refused')) Commented Jul 28, 2022 at 12:20
  • @Tork In the script I define the driver like this driver = webdriver.Remote("127.0.0.1:4444", options=options) Commented Jul 28, 2022 at 12:23

2 Answers 2

2

Don't know your use case with Selenium and the actual error you get, but based on your Dockerfile and your Python script I tried to run the Selenium Getting Started example.

I have just added these two lines to your script:

driver.get("http://www.python.org")
print("Python" in driver.title)

In the first run I faced this error:

Traceback (most recent call last):
  File "/usr/src/app/script.py", line 29, in <module>
    driver.get("http://www.python.org")
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
  (Session info: headless chrome=103.0.5060.134)
Stacktrace:

So based on this answer I fixed the issue by declaring this argument: options.add_argument("--disable-dev-shm-usage")

And then the script worked as expected.

Sign up to request clarification or add additional context in comments.

1 Comment

This argument solved my problem. Now the script works in docker as it should. Thanks for the help!
0

Docker run --shm-size=1gb image_name

or

In chromeoptions in the main.py file or what is your file you can disable shm-usage

Google it.

Basically it fails due to small size of shm when running selenium with chrome.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.