Aggregations for multiple keywords and date ranges (histogram)

Hi, I have a single index with data (let's say about 20 millions of documents for last month), example mapping:

{
	"date-aggregation": {
		"mappings": {
			"doc": {
				"properties": {
					"source_date": {
						"type": "date"
					},
					"title": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
                    },
					"other_fields" {...},
					"source_type": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					}
				}
			}
		}
	}
}

I have 1500 keywords, 4 date ranges and I need to generate date histogram for every combination. To be more precise, it should looks like this:

  1. Get first keyword.
  2. Search for (filter) all documents with this keyword.
  3. Get date histogram for:
  • last month, interval: day
  • last week, interval: day
  • last day, interval: hour
  • last hour, interval: 5 minutes
  1. For every histogram calculate (percentage) how much given keyword has increased (or decreased) in given time range, example: "elasticsearch" in last year histogram increased about 12% (difference between start and the end date).
  2. Results should be sorted by values from point 4.
  3. Do the same for every keyword.

It should be possible to use pagination in ES.

Currently, my query for single keyword and single date range looks like this:

{
	"query": {
		"bool": {
			"filter": [{
				"bool": {
					"should": [
						{"match": {"source_type": "source_1"}},
						{"match": {"source_type": "source_2"}}
					]
				}
			},
			{
				"range": {
					"source_date": {
						"gte": "now-1w",
						"lte": "now"
					}
				}
			}],
			"must": [{
				"multi_match": {
					"query": "elasticsearch",
					"type": "phrase"
				}
			}]
		}
	},
	"aggs": {
		"hour": {
			"date_histogram": {
				"field": "source_date",
				"interval": "day"
			}
		}
	}
}

Creating a separate query for every keyword and every time range is not efficient. Any ideas how can I achieve that using as few queries as possible?

Ideally, the response (from API, not necessarily from ES) should look like this:

[
	"month": {
        "keyword_1": {
            "date_histogram": {},
            "delta: "68%"
        }
        "keyword_2": {
            "date_histogram": {},
            "delta: "12%"
        },
		...
    },
	"week": {
        "keyword_1": {
            "date_histogram": {},
            "delta: "18%"
        }
        "keyword_2": {
            "date_histogram": {},
            "delta: "-12%"
        },
		...
    },
    ...
]

I assume that it's not possible to do a single query to ES to get all the data but I'm trying to find the best way to do that.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.