How to turn an array of object to array of string while reindexing in elasticsearch?

Question

Let say the source index have a document like this :

{
   "name":"John Doe",
   "sport":[
       {
          "name":"surf",
          "since":"2 years"
       },
       {
          "name":"mountainbike",
          "since":"4 years"
       },
   ]
}

How to discard the "since" information so once reindexed the object will contain only sport names? Like this :

{
   "name":"John Doe",
   "sport":["surf","mountainbike"]
}

Note that it would be fine if the resulting field keep the same name, but it's not mandatory.

ThomasC · Accepted Answer · 2018-07-12 15:45:24Z

I don't know which version of elasticsearch you're using, but here is a solution based on pipelines, introduced with ingest nodes in ES v5.0.

1) A script processor is used to extract the values from each subobject and set it in another field (here, sports)
2) The previous sport field is removed with a remove processor

You can use the Simulate pipeline API to test it :

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "random description",
    "processors": [
      {
        "script": {
          "lang": "painless",
          "source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name)  }"
        }
      },
      {
        "remove": {
          "field": "sport"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "doc",
      "_id": "id",
      "_source": {
        "name": "John Doe",
        "sport": [
          {
            "name": "surf",
            "since": "2 years"
          },
          {
            "name": "mountainbike",
            "since": "4 years"
          }
        ]
      }
    }
  ]
}

which outputs the following result :

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_type": "doc",
        "_id": "id",
        "_source": {
          "name": "John Doe",
          "sports": [
            "surf",
            "mountainbike"
          ]
        },
        "_ingest": {
          "timestamp": "2018-07-12T14:07:25.495Z"
        }
      }
    }
  ]
}

There may be a better solution, as I've not used pipelines a lot, or you could make this with Logstash filters before submitting the documents to your Elasticsearch cluster.

For more information about the pipelines, take a look at the reference documentation of ingest nodes.

Collectives™ on Stack Overflow

How to turn an array of object to array of string while reindexing in elasticsearch?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related