Upload data from Node.js stream to ElasticSearch database

Question

My current Node.js code creates a stream from a very large USPTO Patent XML file (approx 100mb) and creates a patentGrant object while parsing the XML stream. The patentGrant object includes publication number, publication country, publication date and kind of patent. I am trying to create a database containing all of the patentGrant objects using ElasticSearch. I've successfully added code to connect to the local ElasticSearch DB but I am having trouble understanding the ElasticSearch-js API. I don't know how I should go about uploading the patentGrant object to the DB. From the following tutorial and a previous stackoverflow question I asked here. It seems like I should use the bulk api.
Heres my ParseXml.js code:

var CreateParsableXml = require('./CreateParsableXml.js');
var XmlParserStream = require('xml-stream');
// var Upload2ES = require('./Upload2ES.js');
var parseXml;


var es = require('elasticsearch');
var client = new es.Client({
    host: 'localhost:9200'
});


// create xml parser using xml-stream node.js module
parseXml = new XmlParserStream(CreateParsableXml.concatXmlStream('ipg140107.xml'));

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) {
    var patentGrant;
    patentGrant = {
        pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'],
        pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'],
        kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'],
        pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date']
    };
    console.log(patentGrant);
});

parseXml.on('end', function() {
    console.log('all done');
});

jperelli · Accepted Answer · 2015-07-12 22:17:53Z

1

The bulk api, as it says in the docs you linked, is used for "index" and "delete" operations.

Use create https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-create

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) {
    var patentGrant;
    patentGrant = {
        pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'],
        pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'],
        kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'],
        pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date']
    };
    client.create({
        index: 'myindex',
        type: 'mytype',
        body: patentGrant,
    }, function() {}
    )
    console.log(patentGrant);
});

without ID, it should create one id as per https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-index_.html#_automatic_id_generation

answered Jul 12, 2015 at 22:17

jperelli

7,2276 gold badges54 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Daniel Kobe Over a year ago

This is great, thanks. Follow up question, how come when I go to localhost:9200/mytype/myindex/ it gives me the following error message {"error":"ElasticsearchIllegalArgumentException[No feature for name [patentGrants]]","status":400}

jperelli Over a year ago

is the index and mapping created? elastic.co/guide/en/elasticsearch/reference/1.6/…

Daniel Kobe Over a year ago

No I did not create the mapping, is there no default mapping that would take care of this for me. Also I've been doing more research and I heard from this video youtube.com/watch?v=7FLXjgB0PQI that you save a lot of network overhead by using the bulk api. For me would using create be better because otherwise I have to store all the data in a javascript object which would then get process by bulk which would have very high memory cost?

Collectives™ on Stack Overflow

Upload data from Node.js stream to ElasticSearch database

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related