0

I am using elasticsearch (ES version 5) scroll api to retrieve all the documents and then write into a csv file. My code is as below. This is working, however there is a little problem. It takes more than 5 minutes to download the file.

try
    {
        TransportClient client = getTransportClient( NoSQLConnectionPool.ELASTIC_STAT );

        if( client != null )
        {
            SearchResponse scrollResp = client.prepareSearch( Constants.INDEX )
                    .addSort( fieldSort( Constants.PRICE ).order( ASC ) )
                    .setScroll( new TimeValue( 60000 ) )
                    .setQuery( buildBoolQuery( request ) )
                    .setSize( 100 ).get(); //max of 100 hits will be returned for each scroll

            //Scroll until no hits are returned
            do
            {
                List<JsonElement> list = getAllElement( scrollResp.getHits().getHits() );
                //                  resultsList.addAll( results );

                buildReportContent( sb, list ); //iterate list and append data to string builder(sb)

                scrollResp = client.prepareSearchScroll( scrollResp.getScrollId() ).setScroll( new TimeValue( 60000 ) ).execute().actionGet();
            }
            while( scrollResp.getHits().getHits().length != 0 ); // Zero hits mark the end of the scroll and the while loop.
        }

        return CsvFileWriter.csvFileWrite( sb );

    }
    catch ( Exception e )
    {
        e.printStackTrace();

    }

Any suggestions to do this more efficiently?

Thank You!

6
  • How many documents are we talking here? Commented Oct 13, 2017 at 7:13
  • Also there's a much easier way to do it with Logstash: stackoverflow.com/questions/41763752/… Commented Oct 13, 2017 at 7:13
  • @Val, more than 85000 documents Commented Oct 13, 2017 at 7:16
  • Is the Logstash alternative viable for your case? Commented Oct 13, 2017 at 7:18
  • 1
    It might be because you're keeping all the 85000 rows in memory before writing them out... maybe you should stream each batch to the CSV file on each iteration and not wait till the end to do it. Commented Oct 13, 2017 at 7:24

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.