I have very large XML files, sometimes over 100mb. I need to populate my ElasticSearch database with the information from these files. My server is written in Node.js. Whats the best way to go about doing this?
2 Answers
There are a couple of ways you could achieve your goal:
Load and parse your XML in a node.js program, and use the elasticsearch node module to index the parsed XML into Elasticsearch. You might want to look into the bulk index API in particular for speedy indexing.
Use logstash to setup a pipeline that reads from the XML files and indexes them into Elasticsearch. Logstash is a plugin-based system with plugins for input, filter, and output stages of the pipeline similar to the extract, transform, and load stages of an ETL pipeline. You might want to look into the file input plugin, the xml filter plugin, and elasticsearch output plugin.
3 Comments
I found a free e-book called exploring elastic search and there is a chapter on piping almost 10GB of Wikipedia XML data into an elasticsearch db. http://exploringelasticsearch.com/searching_wikipedia.html I plan to use this in conjunction with the elasticsearch node module.