0

I have a static mapping in Elasticsearch index. When a message doesn't match this mapping, it is discarded. Is there a way to route it to a default index for wrong messages?

To give you example, I have some fields with integer type:

"status_code": { 
    "type": "integer" 
},

When a message contains a number

"status_code": 123, 

it's ok. But when it is

"status_code": "abc"

it fails.

2
  • Yes, that's possible, can you show your mapping and a sample document that can be successfully indexed and another one that fails. Commented Feb 18, 2019 at 10:37
  • Please update your question with the requested information (not in comments) Commented Feb 18, 2019 at 10:40

3 Answers 3

2

You can have ES do this triage pretty easily using ingest nodes/processors.

The main idea is to create an ingest pipeline with a convert processor for the status_code field and if the conversion doesn't work, you can add an on_failure condition which will direct the document at another index that you can later process.

So create the failures ingest pipeline:

PUT _ingest/pipeline/failures
{
  "processors": [
    {
      "convert": {
        "field": "status_code",
        "type": "integer"
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

Then when you index a document, you can simply specify the pipeline in parameter. Indexing a document with correct status code will succeed:

PUT test/doc/1?pipeline=failures
{
  "status_code": 123
}

However, trying to index a document with a bad status code, will actually also succeed, but your document will be indexed in the failed-test index and not the test one:

PUT test/doc/2?pipeline=failures
{
  "status_code": "abc"
}

After running these two commands, you'll see this:

GET failed-test/_search
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "failed-test",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "status_code" : "abc"
        }
      }
    ]
  }
}

To sum up, you didn't have to handle that exceptional case in your client code and could fully leverage ES ingest nodes to achieve the same task.

Sign up to request clarification or add additional context in comments.

2 Comments

If I understand it correctly, it uses additional message processing to check whether a field can be converted. I wonder if it can be a significant performance issue.
Since there's only one roundtrip involved (compared to a solution where you need to check for an exception on the client side and issue a second call), I'd say that the overhead is negligible. The best way to find out is to try and see.
1

You can set the parameter ignore malformed to ignore just the field with the type mismatch and not the whole document.

And you can try to combine it with multi-fields, that allows you to map the same value in different ways.

You will probably need something like this:

"status_code": {
          "type": "integer",
          "fields": {
            "as_string": { 
              "type":  "keyword"
          }
     }
} 

This way you will have a field named status_code as an intenger and the same value in a field named status_code.as_string as a keyword, but you should test to see if really does what you want.

1 Comment

@jaros Not sure how this answer helps route the document to another index.
0

Use Strict mapping and you will be able to catch the exception raised by Elastic.

Below is the excerpt from Elastic docs:

By default, when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping. This behaviour can be disabled, both at the document and at the object level, by setting the dynamic parameter to false (to ignore new fields) or to strict (to throw an exception if an unknown field is encountered).

As a part of Exception handling, you can push the message to some other index where dynamic mapping is enabled.

3 Comments

There's a way to let ES do this itself without having to catch exceptions on the client side.
Could you give a clue how to do that?
@Val, this is funny that when I ask "Is there a way?" I get the answer "Yes. there is a way" without any details. Unfortunatelly, I can't post you full mappings and documents, these are information not to be seen in public.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.