0

I have very large XML files, sometimes over 100mb. I need to populate my ElasticSearch database with the information from these files. My server is written in Node.js. Whats the best way to go about doing this?

2 Answers 2

0

There are a couple of ways you could achieve your goal:

  1. Load and parse your XML in a node.js program, and use the elasticsearch node module to index the parsed XML into Elasticsearch. You might want to look into the bulk index API in particular for speedy indexing.

  2. Use logstash to setup a pipeline that reads from the XML files and indexes them into Elasticsearch. Logstash is a plugin-based system with plugins for input, filter, and output stages of the pipeline similar to the extract, transform, and load stages of an ETL pipeline. You might want to look into the file input plugin, the xml filter plugin, and elasticsearch output plugin.

Sign up to request clarification or add additional context in comments.

3 Comments

Do you recommend storing the XML data in ElasticSearch or putting it in a seperate DB like MongoDB or Postgres?
If I use logstash do I need to host the XML in the database or can I just populate the ElasticSearch from local files?
That depends: how do you plan to use the XML data once it is stored (either in Elasticsearch or in a separate database)?
0

I found a free e-book called exploring elastic search and there is a chapter on piping almost 10GB of Wikipedia XML data into an elasticsearch db. http://exploringelasticsearch.com/searching_wikipedia.html I plan to use this in conjunction with the elasticsearch node module.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.