I am using the following ES query when looking for duplicates:
"aggs": {
"duplicates": {
"terms": {
"field": "phone",
"min_doc_count": 2,
"size": 99999,
"order": {
"_term": "asc"
}
},
"aggs": {
"_docs": {
"top_hits": {
"size": 99999
}
}
}
}
}
It works well, it returns the key which in this case is the phone, and inside of it it returns all the matches. The main problem is exactly that, on the _source it brings everything, which is a lot of fields on my case, and I wanted to specify to bring only the ones I need. Example of what's returning:
"duplicates": {
"1": {
"key": "1",
"doc_count": 2,
"_docs": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "local:company_id:1:sync",
"_type": "leads",
"_id": "23",
"_score": 1,
"_source": {
"id": 23,
"phone": 123456,
"areacode_id": 426,
"areacode_state_id": 2,
"firstName": "Brayan",
"lastName": "Rastelli",
"state": "", // .... and so on
I want to specify the fields that will be returned on the _source, is that possible?
Another problem that I'm having is that I want to order the aggregation results by a specific field (by id) but if I put any field name instead of _term it gives me an error.
Thank you!