I am currently writing WebCrawler, which operates over 8 threads, each thread gets page, scrapes for links and then check if the links have already been captured. If they are new links then they are stored.
This all works, but I’ve since encountered memory problems, so I started migrating the crawler over to store the data in MySQL database.
The problem I’m having, is how I get each thread to independently interact with the database, checking for data and inserting data if required.
It currently works with one thread, but as soon as I scale the thread pool, I get connection is already open errors.
Each thread has its own connection object, created on the thread for connecting to the database. Am I ignorantly concluding that these connections can be separate?