0

We wanted to move a (bad implementation) daily unique PHP IP logger we set up years ago, and move it to Elasticsearch instead.

Not completely sure how we are going to structure it yet, but are considering logging each and every request as a single document for more possibilities for dynamic analyzing.

something like this:

{
  "_index": "logger",
  "_type": "_doc",
  "_id": "-1q04XEBfzHON7FKVuMY", // Auto-generated
  "_source": {
    "ip": "211.543.232.533",
    "user": "",
    "request": "GET /index.php HTTP/1.1",
    "status": 200,
    "bytes": 10984,
    "refer": "https://www.google.com/search?q=some%20website",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
    "domain": "example.org"
    "timestamp": 1662865208000
  }
}

Now the issue here is that the ip may appear multiple times, and I was wondering if it was possible to count all unique requests from 24:00 ?

For instance, let's say there are 6 documents, 3 having ip field be 211.543.232.533, 2 haivng 192.168.1.1 and one having 127.0.0.1. How could it be possible to count this as 3 hits?

Maybe a search that looks something like this:

POST /logger/_doc/_count
{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "timestamp": {
                            "gt": 1662854400000 // Epoch ms time at 24:00
                        }
                    }
                }
            ]
            // And then something here? I'm not really sure what to do
        }
    }
}

Is this something that can be defined in a search? Or perhaps you need to set up some mapping type? analyzer?

Currently there are around 500'000 requests each day, around 30'000 being unique.

2
  • 1
    did you try this: elastic.co/guide/en/elasticsearch/reference/current/… ? Commented Sep 11, 2022 at 8:09
  • @star67 The issue here is that it's not precise after a certain amount of documents. I can give it a shot and update the question after I have tried this. Also curious how well the server is going to handle such query, how slow is it going to be. Commented Sep 11, 2022 at 18:58

2 Answers 2

1

If you want the precise results, then you can consider using the composite aggregation with terms aggregation, although as mentioned in the official documents its a costly aggregation hence do the load test before using it in Production.

Sign up to request clarification or add additional context in comments.

Comments

0

In the end I went for star67's comment. Once the precision_threshold is set high enough, the precision count is only off by a very very few which is ok for this situation in particular. More info here

This is the final request:


POST /logger/_doc/_search
{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "timestamp": {
                            "gt": 1662854400000
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "ip_count": {
            "cardinality": {
                "field": "ip",
                "precision_threshold": 100000
            }
        }
    }
}

Which returns something like:

{
    "took": 507,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "ip_count": {
            "value": 91906
        }
    }
}

with ~500 000 documents, this performs ok (~500 ms) when it's not a query that needs to be ran often. Consider doing it differently if this search will be under high load.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.