0

I got the following document indexed in ES6:

{
  "id": 1234,
  ...,
  "images": [
    {
      "id": 1703805,
      ...,
      "language_codes": [],
      "ingest_source_ids": [123]
    },
    {
      "id": 2481938,
      ...,
      "language_codes": ["EN"],
      "ingest_source_ids": [1,2,3]
    }
  ]
}

The images object is mapped as nested.

I can find the document just fine using this query:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.ingest_source_ids": 123
        }
      }
    }
  }
}

But if I instead wanna find via languages_codes I do not find document:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.language_codes": "EN"
        }
      }
    }
  }
}

ingest_source_ids has been in the documents since day one. The language_codes field has been added later. I do recall something about Elasticsearch doing some magic mapping with the initial documents, but on the other hand as far as I can read in the documentation, there's no special mapping needed for arrays - all fields can contain arrays as long as all keys are same type.

In this case it works fine with all keys being numeric in ingest_source_ids, but language_codes are also always strings, so should be same case.

What am I missing?

1 Answer 1

1

If you have not explicitly defined any index mapping for language_codes, then by default it will be indexed as :

 "language_codes": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }

Considering that you are using the term query, you must utilize this query on the keyword type field in order for the query term to match the exact term documents.

Replace your query with:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.language_codes.keyword": "EN"
        }
      }
    }
  }
}

Sign up to request clarification or add additional context in comments.

3 Comments

You are a life saver - do you know why using .keyword returns many identical documents? I got a document with three images all containing the language code EN which results in 3 results instead of just the one unique document?
@mtrolle using .keyword will give you all the documents that have language_codes = "EN". But is it like you are getting duplicate documents and you want to get only 1 unique document based on a particular field ?
Turned out to be wrongful data expectations - thanks @ESCoder.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.