17

I want to run BeautifulSoup and selenium webdriver in amazon lambda and my running environment is python 3.6. Is it possible to run ? if so How. My intention is to scrap datas from a webpage using beautiful soup 4 and selenium(Since it has to scrap data dynamically generated by javascript).

4
  • 1
    What is the error you face when you try it? Commented Apr 21, 2018 at 7:14
  • Unable to import module 'lambda_function': No module named 'bs4', and Unable to import module 'lambda_function': No module named ' "Selenium"' @TarunLalwani Commented Apr 21, 2018 at 7:24
  • 1
    Did you package those modules as instructed by lambda? Also do remember that there will be no firefox or chrome inside your lambda. So you need to have a external grid which needs to be available to your lambda for it to work Commented Apr 21, 2018 at 7:29
  • you can scrap without using selenium also. if you need to click some button or scroll then check for the xhr request in network tab Commented Apr 21, 2018 at 7:32

2 Answers 2

40

Yes, it's possible. You need to package a headless Chrome binary and chromedriver along with all the Python packages you need. You'll also need to set several options in Selenium's Chrome web driver to make it work.

I wrote a step-by-step tutorial after spending several frustrating weeks trying to deploy it.

Sign up to request clarification or add additional context in comments.

Comments

1

You will need to create a deployment package and upload it to Lambda if you are going to use dependancies outside of the standard library.

I have a write up about using BS4 and Lambda together. I did not use Selenium within Lambda but I do have extensive Selenium experience. You will not be able to execute commands within a browser using Lambda. You are going to need to have a remote server stood up, running Selenium Server. Download Selenium and the webdrivers on the machine that you wish to do the web scraping, start the .jar file, it will open a port on the machine Selenium will communicate with.

Considering that you will need a machine running probably windows to fire up a browser and scrape these pages, you probably don't need lambda in the end.

2 Comments

You can definitely use selenium inside Lambda using this as a starting point: github.com/21Buttons/pychromeless
You can run any application headless pretty much.. as long as there is a way to control it without a mouse

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.