0

I'm trying to get data from ElasticSearch with my node application. In my index, there are 1 million records, thus I cannot be sent to another services with the whole records. That's why I want to get 10,000 records per request, as per example:

const getCodesFromElasticSearch = async (batch) => {
  let startingCount = 0;
  if (batch > 1) {
    startingCount = (batch * 1000);
  } else if (batch === 1) {
    startingCount = 0;
  }
  return await esClient.search({
    index: `myIndex`,
    type: 'codes',
    _source: ['column1', 'column2', 'column3'],
    body: {
      from: startingCount,
      size: 1000,
      query: {
        bool: {
          must: [
              ....
          ],
          filter: {
              ....
          }
        }
      },
      sort: {
        sequence: {
          order: "asc"
        }
      }
    }
  }).then(data => data.hits.hits.map(esObject => esObject._source));
}

It's still working when batch=1. But when goes to batch=2, that got problem that from should not be larger than 10,000 as per its documentation. And I don't want to change max_records as well. Please let me know any alternate way to get 10,000 by 10,000.

1 Answer 1

1

The scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use the cursor on a traditional database.

So you can use scroll API to get your whole 1M dataset below-something like below without using from because elasticsearch normal search has a limit of 10k record in max request so when you try to use from with greater value then it'll return error, that's why scrolling is good solutions for this kind of scenarios.

let allRecords = [];

// first we do a search, and specify a scroll timeout
var { _scroll_id, hits } = await esclient.search({
index: 'myIndex',
type: 'codes',
scroll: '30s',
body: {
    query: {
        "match_all": {}
    },
    _source: ["column1", "column2", "column3"]
  }
})

while(hits && hits.hits.length) {
// Append all new hits
allRecords.push(...hits.hits)

console.log(`${allRecords.length} of ${hits.total}`)

var { _scroll_id, hits } = await esclient.scroll({
    scrollId: _scroll_id,
    scroll: '30s'
 })
}

console.log(`Complete: ${allRecords.length} records retrieved`)

You can also add your query and sort with this existing code snippets.

As per comment:

Step 1. Do normal esclient.search and get the hits and _scroll_id. Here you need to send the hits data to your other service and keep the _scroll_id for a future batch of data calling.

Step 2 Use the _scroll_id from the first batch and use a while loop until you get all your 1M record with esclient.scroll. Here you need to keep in mind that you don't need to wait for all of your 1M data, within the while loop when you get response back just send it to your service batch by batch.

See Scroll API: https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/scroll_examples.html

**See Search After **: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-search-after.html

Sign up to request clarification or add additional context in comments.

4 Comments

I know that solution already. Problem is I don't want to get the whole records inside my index because of performance issue will be happening at receiver side, that's why I want to send batch by batch.
@PPShein I am little bit confused by your comment, using scroll API you'll get data by scroll limit, so you can send that dataset to your other service, then next scrolled dataset to your server and so on.
as I mentioned in my question as well, I don't want to return over 1 million records in single request because of receiver's performance. That's why I want to response 10,000 records per single request. But in from, that cannot be allowed to put 10,000 value.
@PPShein I've updated my answer with some steps, please have a look and let me know is that make sense, Maybe I'm wrong but it seems your not very much familiar with scroll api's work approach. Correct me if I'm wrong :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.