Hi, I have a single index with data (let's say about 20 millions of documents for last month), example mapping:
{
"date-aggregation": {
"mappings": {
"doc": {
"properties": {
"source_date": {
"type": "date"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"other_fields" {...},
"source_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
I have 1500 keywords, 4 date ranges and I need to generate date histogram for every combination. To be more precise, it should looks like this:
- Get first keyword.
- Search for (filter) all documents with this keyword.
- Get date histogram for:
- last month, interval: day
- last week, interval: day
- last day, interval: hour
- last hour, interval: 5 minutes
- For every histogram calculate (percentage) how much given keyword has increased (or decreased) in given time range, example: "elasticsearch" in last year histogram increased about 12% (difference between start and the end date).
- Results should be sorted by values from point 4.
- Do the same for every keyword.
It should be possible to use pagination in ES.
Currently, my query for single keyword and single date range looks like this:
{
"query": {
"bool": {
"filter": [{
"bool": {
"should": [
{"match": {"source_type": "source_1"}},
{"match": {"source_type": "source_2"}}
]
}
},
{
"range": {
"source_date": {
"gte": "now-1w",
"lte": "now"
}
}
}],
"must": [{
"multi_match": {
"query": "elasticsearch",
"type": "phrase"
}
}]
}
},
"aggs": {
"hour": {
"date_histogram": {
"field": "source_date",
"interval": "day"
}
}
}
}
Creating a separate query for every keyword and every time range is not efficient. Any ideas how can I achieve that using as few queries as possible?
Ideally, the response (from API, not necessarily from ES) should look like this:
[
"month": {
"keyword_1": {
"date_histogram": {},
"delta: "68%"
}
"keyword_2": {
"date_histogram": {},
"delta: "12%"
},
...
},
"week": {
"keyword_1": {
"date_histogram": {},
"delta: "18%"
}
"keyword_2": {
"date_histogram": {},
"delta: "-12%"
},
...
},
...
]
I assume that it's not possible to do a single query to ES to get all the data but I'm trying to find the best way to do that.
Thanks.