I need to model web site with users and articles where each user can interact (read, open e.t.c) with any article many times. I want to do it one elasticsearch index by following nested mapping:
{
"mappings": {
"user": {
"properties": {
"user_id": {"type": "string"},
"interactions": {
"type": "nested",
"properties": {
"article_id": {"type": "string"},
"interact_date": {"type": "date"}
}
}
}
}
}
}
example of indexed document:
{
"user_id": 20,
"interactions": [
{"article_id": "111", "interact_date": "2015-01-01"},
{"article_id": "111", "interact_date": "2015-01-02"},
{"article_id": "222", "interact_date": "2015-01-01"}
]
}
I need to do the following aggregations on the data:
-
Total number of interactions per day. The following query looks OK for me:
GET /_search { "size": 0, "aggs": { "by_date": { "nested": { "path": "interactions" }, "aggs": { "m_date": { "terms": { "field": "interactions.interact_date" } } } } } }
The example doc should add 1 for 2015-01-02 and 2 for 2015-01-01 to aggreration
-
Number of unique users interactions per day. If specific user interacted with several articles at same date range the user should be counted only once. In postgres it's simple query: for table with 3 columns [user_id, article_id, interact_date]
SELECT dt, count(uid) FROM (SELECT interact_date::TIMESTAMP::DATE dt, user_id uid FROM interactions GROUP BY interact_date::TIMESTAMP::DATE, user_id) by_date GROUP BY dt; How can I do the same in elasticsearch? The example doc should add **1** for **2015-01-02** and **1** for **2015-01-01** to aggreration
-
How to add interactions by _update without re-indexing whole document?
-
How to filter users by specific articles - count user once in aggregation by date only if he interacted with one of specified articles?
Thank you