Hi,
I'm trying to get a better understanding of aggregations, so here are a
couple of questions that came up recently.
Question 1:
I have some time based data that I am using aggregations to chart. The
data may be sparsely populated, so I've been setting min_doc_count to 0 so
I get empty buckets back anyway. I've noticed that it will fill in empty
buckets unless they are before or after the first record of the range.
For example, if I use a query similar to the one below, and there are no
records after 3/15/14T16:15, the last aggregation record will be for
3/15/14T16:15. On the other hand, if there is a gap in between the start
time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as
expected).
POST _all/summary_phys/_search
{
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "@timestamp",
"interval": "300s",
"min_doc_count": 0
},
"aggs": {
"events_by_host": {
"terms": {
"field": "host.raw"
},
"aggs": {
"avg_used": {
"avg": {
"field": "used"
}
},
"max_used": {
"max": {
"field": "used"
}
}
}
}
}
}
}
}
Not getting the 0 doc count buckets back at the front and back of the range
seems contrary to the documented purpose of min_doc_count. Am I doing
something wrong?
Question 2:
If I add a min_doc_count = 0 to the inner aggregation, but limit the search
to a specific doc type like:
doc type
v
POST _all/summary_phys/_search
{
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "@timestamp",
"interval": "300s",
"min_doc_count": 0
},
"aggs": {
"events_by_host": {
"terms": {
"field": "host.raw",
"min_doc_count": 0
},
"aggs": {
"avg_used": {
"avg": {
"field": "used"
}
},
"max_used": {
"max": {
"field": "used"
}
}
}
}
}
}
}
}
I get buckets with entries matching hosts that do not show up in this doc
type. For example, I have only 3 values for host in this doc type
[compute-4, compute-2, compute-3], but I will get buckets back with hosts
from other doc types like:
"events_by_host": {
"buckets": [
{
"key": "compute-4",
"doc_count": 11,
"max_used": {
"value": 4608
},
"avg_used": {
"value": 3677.090909090909
}
},
{
"key": "compute-2",
"doc_count": 8,
"max_used": {
"value": 4608
},
"avg_used": {
"value": 2304
}
},
{
"key": "compute-3",
"doc_count": 2,
"max_used": {
"value": 4608
},
"avg_used": {
"value": 4608
}
},
{
"key": "10.10.11.22:49509",
"doc_count": 0,
"max_used": {
"value": null
},
"avg_used": {
"value": null
}
},
{
"key": "controller",
"doc_count": 0,
"max_used": {
"value": null
},
"avg_used": {
"value": null
}
},
{
"key": "object-1",
"doc_count": 0,
"max_used": {
"value": null
},
"avg_used": {
"value": null
}
}
]
}
Is there a way to ensure that the inner aggregation also only buckets
things matching the search doc type?
Thanks in advance...
John
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/856133dc-c4ae-4cfc-adab-39453671d76d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.