Hello guys.
When i'm using the following aggregation:
"2": {
"terms": {
"field": "name",
"size": 100,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "value"
}
},
"3": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d",
"time_zone": "UTC",
"min_doc_count": 1
},
"aggs": {
"1": {
"avg": {
"field": "value"
}
}
}
}
}
}
}
I'm facing with situation when i have missing data blocks on the chart, because in some indices
highlighted block is not present in this top 100
Is there some way to apply aggregation to all data, and not directly to each index inside index pattern?
Or how to get data for all 100 items without skipping ?
Hi, thanks for reply. Can you show me where i can apply shard_size in this aggregation structure ?
Right alongside your "size" : 100 parameter
Not helped ( still see the data gaps.
here is the updated agg object
"2": {
"terms": {
"field": "name",
"size": 100,
"shard_size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "value"
}
},
"3": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d",
"time_zone": "UTC",
"min_doc_count": 1
},
"aggs": {
"1": {
"avg": {
"field": "value"
}
}
}
}
}
}
}
You'll likely need to increase it. There's a danger you can use a lot of memory and cause a circuit-breaker exception if you have a lot of unique terms - we'll then need to talk more about different strategies.
I tried "shard_size": 100000000, nothing changed. the data gaps on the places. but if i'll set size to 200 all fine, no data gaps. But what i need is 100 items without data gaps, not more .
Strange. Roughly how many unique "name" values are there? (The cardinality aggregation can help tell you this)
104 unique name so with size 100 i see data gaps and 200 works correct. Seems that shard_size not affecting on something
Are you checking the results for partial errors?
When you query 5 shards successfully you should see 5/5 successes in the JSON response eg
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
}
yeah successful: 1 total:1 no failed or skipped
So you only have one index and one shard? That should make life even easier - there shouldn't be any of the usual concerns over terms accuracy and increasing shard_size etc.
Two more questions - what elasticsearch version are you using and does it still fail to produce the correct results if you try remove the min_doc_count:1 parameter on your date_histogram agg?
es version is 5.2.2; Seems nothing changed when i removed min_doc_count.
So assuming we have a passing test (size:200) and a failing test (size:100) let's try and simplify the aggregation to compare the results of these collections.
Can you replace the date_histogram aggregation with a simple sum aggregation on the value field.
I'd like to know if the reported sums differ for the size:200 and size:100 queries. That should at least tell us if we're looking at the same set of docs/terms in the 2 queries.
Thanks! "shard_size" helped in case of multiple indices.