1

Let say the source index have a document like this :

{
   "name":"John Doe",
   "sport":[
       {
          "name":"surf",
          "since":"2 years"
       },
       {
          "name":"mountainbike",
          "since":"4 years"
       },
   ]
}

How to discard the "since" information so once reindexed the object will contain only sport names? Like this :

{
   "name":"John Doe",
   "sport":["surf","mountainbike"]
}

Note that it would be fine if the resulting field keep the same name, but it's not mandatory.

1 Answer 1

5

I don't know which version of elasticsearch you're using, but here is a solution based on pipelines, introduced with ingest nodes in ES v5.0.

  • 1) A script processor is used to extract the values from each subobject and set it in another field (here, sports)
  • 2) The previous sport field is removed with a remove processor

You can use the Simulate pipeline API to test it :

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "random description",
    "processors": [
      {
        "script": {
          "lang": "painless",
          "source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name)  }"
        }
      },
      {
        "remove": {
          "field": "sport"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "doc",
      "_id": "id",
      "_source": {
        "name": "John Doe",
        "sport": [
          {
            "name": "surf",
            "since": "2 years"
          },
          {
            "name": "mountainbike",
            "since": "4 years"
          }
        ]
      }
    }
  ]
}

which outputs the following result :

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_type": "doc",
        "_id": "id",
        "_source": {
          "name": "John Doe",
          "sports": [
            "surf",
            "mountainbike"
          ]
        },
        "_ingest": {
          "timestamp": "2018-07-12T14:07:25.495Z"
        }
      }
    }
  ]
}

There may be a better solution, as I've not used pipelines a lot, or you could make this with Logstash filters before submitting the documents to your Elasticsearch cluster.

For more information about the pipelines, take a look at the reference documentation of ingest nodes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.