41

Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N (where N is the number of documents currently indexed)?

And as a follow-up question: if all documents have some numeric field s, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i with value s_i is equal to s_i / sum(s_j for j in index)?

5 Answers 5

78

I know it is an old question, but now it is possible to use random_score, with the following search query:

{
   "size": 1,
   "query": {
      "function_score": {
         "functions": [
            {
               "random_score": {
                  "seed": "1477072619038"
               }
            }
         ]
      }
   }
}

For me it is very fast with about 2 million documents.

I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.

Sign up to request clarification or add additional context in comments.

2 Comments

NOTE: by default with the newer version of ES if you don't provide a seed the current timestamp is used. ALSO, I've found that if you let ES use it's own seed (current timestamp) the query is 20x faster (this is on a very large cluster, 6 seconds vs 150 seconds)
7

The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:

sort: {
  _script: {
    script: "Math.random() * 200000",
    type: "number",
    params: {},
    order: "asc"
 }
}

You can use that script to make some weighting based on some field of the record.

It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.

2 Comments

Can't use seed with this. n documents will be grouped and having same score where n is the shard size.
does painless script Math.random() return a value between 0 and 1 inclusive?
6

You can use random_score with a function_score query.

{
    "size":1,
    "query": {
        "function_score": {
            "functions": [
                {
                    "random_score":  {
                        "seed": 11
                    }
                }
            ],
            "score_mode": "sum",
        }
    }
}

The bad part is that this will apply a random score to every document, sort the documents, and then return the first one. I don't know of anything that is smart enough to just pick a random document.

Comments

5

NEST Way :

var result = _elastic.Search<dynamic>(s => s
        .Query(q => q
        .FunctionScore(fs => fs.Functions(f => f.RandomScore())
        .Query(fq => fq.MatchAll()))));

raw query way :

 GET index-name/_search
    "size": 1,
    "query": {
        "function_score": {
                "query" : { "match_all": {} },
               "random_score": {}
        }
    }
}

Comments

2

You can use random_score to randomly order responses or retrieve a document with roughly 1/N probability.

Additional notes:

https://github.com/elastic/elasticsearch/issues/1170 https://github.com/elastic/elasticsearch/issues/7783

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.