1

I need to model web site with users and articles where each user can interact (read, open e.t.c) with any article many times. I want to model this data in one elasticsearch index by following nested mapping:

{
    "mappings": {
        "user": {
            "properties": {
                "user_id": {"type": "string"},
                "interactions": {
                    "type": "nested",
                    "properties": {
                        "article_id": {"type": "string"},
                        "interact_date": {"type": "date"}
                    }
                }
            }
        }
    }
}

example of indexed document:

{
    "user_id": 20,
    "interactions": [
        {"article_id": "111", "interact_date": "2015-01-01"},
        {"article_id": "111", "interact_date": "2015-01-02"},
        {"article_id": "222", "interact_date": "2015-01-01"}
     ]
}

I need to do the following aggregations on the data:

  1. Total number of interactions per day, done by nested aggregation:

    GET /_search
    {
        "size": 0,
        "aggs": {
            "by_date": {
                "nested": {
                    "path": "interactions"
                },
                "aggs": {
                    "m_date": {"terms": {"field": "interactions.interact_date"}}
                }
            }
        }
    }
    
  2. Number of unique users interactions per day. If specific user interacted with several articles at same date range the user should be counted only once. In postgres it's simple query: for table with 3 columns [user_id, article_id, interact_date]

    SELECT dt, count(uid)
    FROM (SELECT interact_date::TIMESTAMP::DATE dt, user_id uid FROM interactions
            GROUP BY interact_date::TIMESTAMP::DATE, user_id) by_date
    GROUP BY dt;
    

    How can I do the same in elasticsearch index?

  3. How to add interactions by _update without re-indexing whole document?

  4. How to filter users by specific articles - count user once in aggregation by date only if he interacted with one of specified articles?

Thank you

1 Answer 1

1

Number of unique users interactions per day.

{
  "size": 0,
  "aggs": {
    "nested_agg": {
      "nested": {
        "path": "interactions"
      },
      "aggs": {
        "per_day": {
          "date_histogram": {
            "field": "interactions.interact_date",
            "interval": "day",
            "min_doc_count": 1
          },
          "aggs": {
            "users_count": {
              "reverse_nested": {},
              "aggs": {
                "uniques": {
                  "cardinality": {
                    "field": "user_id"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

How to add interactions by _update without re-indexing whole document?

That's impossible. This is the definition of a nested object: To update, add, or remove a nested object, we have to reindex the whole document.

How to filter users by specific articles - count user once in aggregation by date only if he interacted with one of specified articles?

{
  "size": 0,
  "query": {
    "nested": {
      "path": "interactions",
      "query": {
        "term": {
          "interactions.article_id": {
            "value": "222"
          }
        }
      }
    }
  },
  "aggs": {
    "nested_agg": {
      "nested": {
        "path": "interactions"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "term": {
              "interactions.article_id": {
                "value": "222"
              }
            }
          },
          "aggs": {
            "per_day": {
              "date_histogram": {
                "field": "interactions.interact_date",
                "interval": "day",
                "min_doc_count": 1
              },
              "aggs": {
                "users_count": {
                  "reverse_nested": {},
                  "aggs": {
                    "uniques": {
                      "cardinality": {
                        "field": "user_id"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @AndreiStefan. Regarding the re-index, I mean if it's possible to add new interaction by "safe" update operation that add the iteration only of it's still not exists. Instead fetching document, update it in the client and put back. I have many threads that update interactions for same user and want to update docs without distributed locks.
Use the version (elastic.co/guide/en/elasticsearch/reference/current/…) to make such updates.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.