Aggregate elasticsearch index by nested filed values


(Hershs) #1

I need to model web site with users and articles where each user can interact (read, open e.t.c) with any article many times. I want to do it one elasticsearch index by following nested mapping:

{
    "mappings": {
        "user": {
            "properties": {
                "user_id": {"type": "string"},
                "interactions": {
                    "type": "nested",
                    "properties": {
                        "article_id": {"type": "string"},
                        "interact_date": {"type": "date"}
                    }
                }
            }
        }
    }
}

example of indexed document:

{
    "user_id": 20,
    "interactions": [
        {"article_id": "111", "interact_date": "2015-01-01"},
        {"article_id": "111", "interact_date": "2015-01-02"},
        {"article_id": "222", "interact_date": "2015-01-01"}
     ]
}

I need to do the following aggregations on the data:

  1. Total number of interactions per day. The following query looks OK for me:

     GET /_search
     {
         "size": 0,
         "aggs": {
             "by_date": {
                 "nested": {
                     "path": "interactions"
                 },
                 "aggs": {
                     "m_date": {
                         "terms": {
                             "field": "interactions.interact_date"
                         }
                     }
                 }
             }
         }
     }
    

    The example doc should add 1 for 2015-01-02 and 2 for 2015-01-01 to aggreration

  2. Number of unique users interactions per day. If specific user interacted with several articles at same date range the user should be counted only once. In postgres it's simple query: for table with 3 columns [user_id, article_id, interact_date]

     SELECT dt, count(uid)
     FROM (SELECT interact_date::TIMESTAMP::DATE dt, user_id uid FROM interactions
             GROUP BY interact_date::TIMESTAMP::DATE, user_id) by_date
     GROUP BY dt;
    How can I do the same in elasticsearch?
    The example doc should add **1** for **2015-01-02** and **1** for **2015-01-01** to aggreration
    
  3. How to add interactions by _update without re-indexing whole document?

  4. How to filter users by specific articles - count user once in aggregation by date only if he interacted with one of specified articles?

Thank you


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.