I want to run BeautifulSoup and selenium webdriver in amazon lambda and my running environment is python 3.6. Is it possible to run ? if so How. My intention is to scrap datas from a webpage using beautiful soup 4 and selenium(Since it has to scrap data dynamically generated by javascript).
2 Answers
Yes, it's possible. You need to package a headless Chrome binary and chromedriver along with all the Python packages you need. You'll also need to set several options in Selenium's Chrome web driver to make it work.
I wrote a step-by-step tutorial after spending several frustrating weeks trying to deploy it.
Comments
You will need to create a deployment package and upload it to Lambda if you are going to use dependancies outside of the standard library.
I have a write up about using BS4 and Lambda together. I did not use Selenium within Lambda but I do have extensive Selenium experience. You will not be able to execute commands within a browser using Lambda. You are going to need to have a remote server stood up, running Selenium Server. Download Selenium and the webdrivers on the machine that you wish to do the web scraping, start the .jar file, it will open a port on the machine Selenium will communicate with.
Considering that you will need a machine running probably windows to fire up a browser and scrape these pages, you probably don't need lambda in the end.
firefoxorchromeinside your lambda. So you need to have a external grid which needs to be available to yourlambdafor it to work