Aggregation is skipping needed data and not give expected results

Hello guys.

When i'm using the following aggregation:

  "2": {
    "terms": {
      "field": "name",
      "size": 100,
      "order": {
        "1": "desc"
      }
    },
    "aggs": {
      "1": {
        "avg": {
          "field": "value"
        }
      },
      "3": {
        "date_histogram": {
          "field": "@timestamp",
          "interval": "1d",
          "time_zone": "UTC",
          "min_doc_count": 1
        },
        "aggs": {
          "1": {
            "avg": {
              "field": "value"
            }
          }
        }
      }
    }
  }
}

I'm facing with situation when i have missing data blocks on the chart, because in some indices
highlighted block is not present in this top 100

Is there some way to apply aggregation to all data, and not directly to each index inside index pattern?
Or how to get data for all 100 items without skipping ?

1 Like

+1
Same issue here.

Try increase the shard_size parameter. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts

1 Like

Hi, thanks for reply. Can you show me where i can apply shard_size in this aggregation structure ?

Right alongside your "size" : 100 parameter

Not helped ( still see the data gaps.

here is the updated agg object

      "2": {
        "terms": {
          "field": "name",
          "size": 100,
          "shard_size": 500,
          "order": {
            "1": "desc"
          }
        },
        "aggs": {
          "1": {
            "avg": {
              "field": "value"
            }
          },
          "3": {
            "date_histogram": {
              "field": "@timestamp",
              "interval": "1d",
              "time_zone": "UTC",
              "min_doc_count": 1
            },
            "aggs": {
              "1": {
                "avg": {
                  "field": "value"
                }
              }
            }
          }
        }
      }
    }

You'll likely need to increase it. There's a danger you can use a lot of memory and cause a circuit-breaker exception if you have a lot of unique terms - we'll then need to talk more about different strategies.

I tried "shard_size": 100000000, nothing changed. the data gaps on the places. but if i'll set size to 200 all fine, no data gaps. But what i need is 100 items without data gaps, not more .

Strange. Roughly how many unique "name" values are there? (The cardinality aggregation can help tell you this)

104 unique name so with size 100 i see data gaps and 200 works correct. Seems that shard_size not affecting on something

Are you checking the results for partial errors?
When you query 5 shards successfully you should see 5/5 successes in the JSON response eg

  "_shards": {
	"total": 5,
	"successful": 5,
	"skipped": 0,
	"failed": 0
  }

yeah successful: 1 total:1 no failed or skipped

So you only have one index and one shard? That should make life even easier - there shouldn't be any of the usual concerns over terms accuracy and increasing shard_size etc.

Two more questions - what elasticsearch version are you using and does it still fail to produce the correct results if you try remove the min_doc_count:1 parameter on your date_histogram agg?

1 Like

es version is 5.2.2; Seems nothing changed when i removed min_doc_count.

So assuming we have a passing test (size:200) and a failing test (size:100) let's try and simplify the aggregation to compare the results of these collections.

Can you replace the date_histogram aggregation with a simple sum aggregation on the value field.
I'd like to know if the reported sums differ for the size:200 and size:100 queries. That should at least tell us if we're looking at the same set of docs/terms in the 2 queries.

1 Like

Thanks! "shard_size" helped in case of multiple indices.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.