0

I have a school project in which I use the ELK stack.

I have lots of data and I want to know which log lines are duplicate and how many duplicates there are for that particular log line based on their log level, server and time range.

I tried this query in which I succesfully extracted the duplicate numbers:

GET /_all/_search
{
  "query": {
"bool": {
  "must": [        
    {
      "match": {
        "beat.hostname": "server-x"
      }
    },
    {
      "match": {
        "log_level": "WARNING"
      }
    },{
      "range": {
      "@timestamp" : {
        "gte" : "now-48h",
        "lte" : "now"
      }
    }
    }
  ]
}
  },
  "aggs": {
"duplicateNames": {
  "terms": {
    "field": "message_description.keyword",
    "min_doc_count": 2,
    "size": 10000
  }
}
  }
}

It successfully gives me the output:

"aggregations" : {
"duplicateNames" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "AuthToken not found [ ]",
      "doc_count" : 657
    }
  ]
}

When I try the very same query and only change the log_level from WARNING to CRITICAL it gives me 0 buckets. This is strange cause I can see in Kibana that there are duplicate message_description field values. Does this has something to do with the .keyword or maybe the length of the message_description?

I hope someone can help me with this weird problem.

Edit: These are two documents that are having exactly the same message_description, why can't I get the results?

 {
        "_index" : "filebeat-2019.09.17",
        "_type" : "_doc",
        "_id" : "yYzDP20BiDGBoVteKHjZ",
        "_score" : 10.144365,
        "_source" : {
          "beat" : {
            "name" : "graylog",
            "hostname" : "server-x",
            "version" : "6.8.2"
          },
          "message" : """[2019-09-17 17:06:57] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
          "@version" : "1",
          "source" : "/data/httpd/xxx/xxx/var/log/dev.log",
          "tags" : [
            "beats_input_codec_plain_applied",
            "_grokparsefailure",
            "_dateparsefailure"
          ],
          "timestamp" : "2019-09-17 17:06:57",
          "input" : {
            "type" : "log"
          },
          "offset" : 54819,
          "prospector" : {
            "type" : "log"
          },
          "application" : "request",
          "log_level" : "CRITICAL",
          "stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
          "message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
          "@timestamp" : "2019-09-17T15:06:57.436Z",
          "host" : {
            "name" : "graylog"
          },
          "log" : {
            "file" : {
              "path" : "/data/httpd/xxx/xxx/var/log/dev.log"
            }
          }
        }
      },
      {
        "_index" : "filebeat-2019.09.17",
        "_type" : "_doc",
        "_id" : "CYzDP20BiDGBoVteKHna",
        "_score" : 10.144365,
        "_source" : {
          "beat" : {
            "name" : "graylog",
            "hostname" : "server-x",
            "version" : "6.8.2"
          },
          "message" : """[2019-09-17 17:06:56] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
          "@version" : "1",
          "source" : "/data/httpd/xxx/xxx/var/log/dev.log",
          "tags" : [
            "beats_input_codec_plain_applied",
            "_grokparsefailure",
            "_dateparsefailure"
          ],
          "timestamp" : "2019-09-17 17:06:56",
          "input" : {
            "type" : "log"
          },
          "offset" : 45716,
          "prospector" : {
            "type" : "log"
          },
          "application" : "request",
          "log_level" : "CRITICAL",
          "stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
          "message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
          "@timestamp" : "2019-09-17T15:06:57.426Z",
          "host" : {
            "name" : "graylog"
          },
          "log" : {
            "file" : {
              "path" : "/data/httpd/xxx/xxx/var/log/dev.log"
            }
          }
        }
      }
2
  • 2
    Can you show your mapping and two sample documents that you think should show up in your results? Commented Sep 19, 2019 at 9:08
  • I added the two documents from which I think should show up in my results. Commented Sep 19, 2019 at 14:28

1 Answer 1

1

What happens is that the message_description field is longer than 256 characters and thus gets ignored. Run GET filebeat-2019.09.17 to confirm this.

What you can do is augment that limit by modifying the mapping of the field like this:

PUT filebeat-*/_doc/_mapping
{
  "properties": {
    "message_description": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 500
        }
      }
    }
  }
}

And then update all the data present in those indexes:

POST filebeat-*/_update_by_query

Once that's done, your query will magically work again ;-)

Sign up to request clarification or add additional context in comments.

2 Comments

I thank you very, very much for this clear explanation!
Oof, I'm getting an error saying: Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true. Do you know how I can solve this issue? I know it has something to do with the Elasticsearch version and that include_type_name is deprecated from v7.x.x. I am running version 7.3. So what is the best way to let your query work?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.