[Aggregation] Filtering by date range and counting nested documents


#1

Hello guys,

I am facing up a problem for which I have been unable to find an answer for weeks now.
Let's say I have an index named "users", and a type "profiles", where each document represents a user profile for an application.
I keep track of some important "events", that are stored and indexed in "profiles" type as nested documents.

The mappings of "users" index:

PUT users
{
  "mappings" : {
    "profiles": {
      "properties": {
        "firstname": {
          "type": "string"
        },
        "lastname": {
          "type": "string"
        },
        "events": {
          "type": "nested",
          "properties": {
            "date": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss"
            },
            "type": {
              "type": "byte"
            }
          }
        }
      }
    }
  }
}

Each new event coming up for a profile will be stored as a new event, with its date and type.
Now the use case: I want to know, from 2015-05-01 to 2015-06-01, how much profiles did trigger how much events ?

Example of documents containing test data:

POST users/profiles/USER1
{
  "firstname": "Carlos",
  "lastname": "Rodriguez",
  "events": [
  	{
  	  "date": "2015-04-13 17:41:21",
  	  "type": 1
  	},
  	{
  	  "date": "2015-05-02 14:14:03",
  	  "type": 2
  	},
  	{
  	  "date": "2015-05-09 09:37:57",
  	  "type": 2
  	}
  ]
}

POST users/profiles/USER2
{
  "firstname": "Frank",
  "lastname": "Bar",
  "events": [
  	{
  	  "date": "2015-05-19 11:24:19",
  	  "type": 2
  	}
  ]
}

POST users/profiles/USER3
{
  "firstname": "Jason",
  "lastname": "Foo",
  "events": [
  	{
  	  "date": "2015-05-14 13:21:14",
  	  "type": 1
  	}
  ]
}

The following aggregation shoud be a good start to play arround with the problem:

GET users/profiles/_search
{
  "size": 0,
  "aggs": {
    "nested-agg": {
      "nested": {
        "path": "events"
      },
      "aggs": {
        "filtered": {
          "range": {
            "field": "events.date",
            "ranges": [
              {
                "from": "2015-05-01 00:00:00",
                "to": "2015-06-01 00:00:00"
              }
            ]
          },
          "aggs": {
            "by-type": {
              "terms": {
                "field": "events.type"
              },
              "aggs": {
                "reversed": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

And with the (truncated) result, I am able to know that:
2 profiles have triggered a total of 3x "Event Type 2"
1 profile has triggered a total of 1x "Event Type 1"

"by-type": {
  "doc_count_error_upper_bound": 0,
  "sum_other_doc_count": 0,
  "buckets": [
    {
      "key": 2,
      "doc_count": 3,
      "reversed": {
         "doc_count": 2
      }
    },
    {
      "key": 1,
      "doc_count": 1,
      "reversed": {
         "doc_count": 1
      }
    }
  ]
}

I need a result like this:

1 profile has triggered 2x "Event Type 2" (USER1)
1 profile has triggered 1x "Event Type 2" (USER2)
1 profile has triggered 1x "Event Type 1" (USER3)

Is it possible to achieve this without using a script ?
I know I should somehow index for each profiles how many time they triggered each event, but the need
to have a date filter makes it, to me, impossible.


(system) #2