I'm working on a project that requires documents to be ocr-ed and the text that's returned stored and be searchable. The biggest obstacle is the performance of full-text searching of the scraped text.
My idea is to use a combination of SQL Server for data persistence and Elasticsearch for performant searching. When a document has been scraped it would be inserted into the database and then if that was successful it would be indexed by Elasticsearch.
Can anyone see any caveats with this setup or offer any insight as to how it could be done better?