1

I'm using elasticsearch in my laravel-app and I want to fetch a huge load of data from a third party API. I've read, that I need to use the scroll API provided by elasticsearch but I can't really figure out how to use it correctly. The amount of data is about 2 million records. So how can I fetch it?

Here is what I did so far:

$query = '
{
    "_source":[
        "Company.*",
        "Company.Metadata.*"
    ],
    "query":{
        "bool": {
            "must": [
                {
                    "match": {
                        "Company.Metadata.status": "active"
                    }
                }
            ]
        }
    },
    "size" : 1000
  }

';

$curl = curl_init();
curl_setopt($curl, CURLOPT_POST, 1);

curl_setopt($curl, CURLOPT_POSTFIELDS, $query);
curl_setopt($curl, CURLOPT_URL, "http://thirdpartyapidomain.com/_search?scroll=1m");
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
   'Content-Type: application/json'
));
curl_setopt($curl, CURLOPT_USERPWD, "user:mypassword");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);

$result = json_decode(curl_exec($curl));

curl_close($curl);
$hits = $result->hits->hits; 

return $hits;

This gives me 1000 records and the scroll_id - but what to do next?

Please help!

1 Answer 1

1

To continue scrolling you have to pass this scroll_id to the scroll api as below. Keep requesting the below in a loop till you no more get hits in response.

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : <scroll_id_here>
}

To keep the scroll context alive (so that next request to scroll does't fail) we pass 1m (1 minute). Keep this value to something within which the processing of current batch finishes.

You can make the above request using curl. Read more on scroll here.

Sign up to request clarification or add additional context in comments.

1 Comment

Hmm I see. Its hard to find any good and/or real examples anywhere, so I'm kind of stuck how to continue... I just want to fetch 1000 records, after that, the next 1000 etc etc. but I dont know how... :-s

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.