1

My current Node.js code creates a stream from a very large USPTO Patent XML file (approx 100mb) and creates a patentGrant object while parsing the XML stream. The patentGrant object includes publication number, publication country, publication date and kind of patent. I am trying to create a database containing all of the patentGrant objects using ElasticSearch. I've successfully added code to connect to the local ElasticSearch DB but I am having trouble understanding the ElasticSearch-js API. I don't know how I should go about uploading the patentGrant object to the DB. From the following tutorial and a previous stackoverflow question I asked here. It seems like I should use the bulk api.
Heres my ParseXml.js code:

var CreateParsableXml = require('./CreateParsableXml.js');
var XmlParserStream = require('xml-stream');
// var Upload2ES = require('./Upload2ES.js');
var parseXml;


var es = require('elasticsearch');
var client = new es.Client({
    host: 'localhost:9200'
});


// create xml parser using xml-stream node.js module
parseXml = new XmlParserStream(CreateParsableXml.concatXmlStream('ipg140107.xml'));

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) {
    var patentGrant;
    patentGrant = {
        pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'],
        pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'],
        kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'],
        pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date']
    };
    console.log(patentGrant);
});

parseXml.on('end', function() {
    console.log('all done');
});

1 Answer 1

1

The bulk api, as it says in the docs you linked, is used for "index" and "delete" operations.

Use create https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-create

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) {
    var patentGrant;
    patentGrant = {
        pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'],
        pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'],
        kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'],
        pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date']
    };
    client.create({
        index: 'myindex',
        type: 'mytype',
        body: patentGrant,
    }, function() {}
    )
    console.log(patentGrant);
});

without ID, it should create one id as per https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-index_.html#_automatic_id_generation

Sign up to request clarification or add additional context in comments.

3 Comments

This is great, thanks. Follow up question, how come when I go to localhost:9200/mytype/myindex/ it gives me the following error message {"error":"ElasticsearchIllegalArgumentException[No feature for name [patentGrants]]","status":400}
is the index and mapping created? elastic.co/guide/en/elasticsearch/reference/1.6/…
No I did not create the mapping, is there no default mapping that would take care of this for me. Also I've been doing more research and I heard from this video youtube.com/watch?v=7FLXjgB0PQI that you save a lot of network overhead by using the bulk api. For me would using create be better because otherwise I have to store all the data in a javascript object which would then get process by bulk which would have very high memory cost?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.