Aggregation is skipping needed data and not give expected results

Grimmjow13r · March 20, 2018, 6:29pm

Hello guys.

When i'm using the following aggregation:

  "2": {
    "terms": {
      "field": "name",
      "size": 100,
      "order": {
        "1": "desc"
      }
    },
    "aggs": {
      "1": {
        "avg": {
          "field": "value"
        }
      },
      "3": {
        "date_histogram": {
          "field": "@timestamp",
          "interval": "1d",
          "time_zone": "UTC",
          "min_doc_count": 1
        },
        "aggs": {
          "1": {
            "avg": {
              "field": "value"
            }
          }
        }
      }
    }
  }
}

I'm facing with situation when i have missing data blocks on the chart, because in some indices
highlighted block is not present in this top 100

Is there some way to apply aggregation to all data, and not directly to each index inside index pattern?
Or how to get data for all 100 items without skipping ?

Waseem · March 21, 2018, 7:57am

+1
Same issue here.

Mark_Harwood · March 21, 2018, 10:34am

Try increase the shard_size parameter. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts

Grimmjow13r · March 22, 2018, 11:36am

Hi, thanks for reply. Can you show me where i can apply shard_size in this aggregation structure ?

Mark_Harwood · March 22, 2018, 11:38am

Right alongside your "size" : 100 parameter

Grimmjow13r · March 22, 2018, 11:41am

Not helped ( still see the data gaps.

here is the updated agg object

      "2": {
        "terms": {
          "field": "name",
          "size": 100,
          "shard_size": 500,
          "order": {
            "1": "desc"
          }
        },
        "aggs": {
          "1": {
            "avg": {
              "field": "value"
            }
          },
          "3": {
            "date_histogram": {
              "field": "@timestamp",
              "interval": "1d",
              "time_zone": "UTC",
              "min_doc_count": 1
            },
            "aggs": {
              "1": {
                "avg": {
                  "field": "value"
                }
              }
            }
          }
        }
      }
    }

Mark_Harwood · March 22, 2018, 11:55am

You'll likely need to increase it. There's a danger you can use a lot of memory and cause a circuit-breaker exception if you have a lot of unique terms - we'll then need to talk more about different strategies.

Grimmjow13r · March 22, 2018, 12:03pm

I tried "shard_size": 100000000, nothing changed. the data gaps on the places. but if i'll set size to 200 all fine, no data gaps. But what i need is 100 items without data gaps, not more .

Mark_Harwood · March 22, 2018, 12:28pm

Strange. Roughly how many unique "name" values are there? (The cardinality aggregation can help tell you this)

Grimmjow13r · March 22, 2018, 1:04pm

104 unique name so with size 100 i see data gaps and 200 works correct. Seems that shard_size not affecting on something

Mark_Harwood · March 22, 2018, 1:15pm

Are you checking the results for partial errors?
When you query 5 shards successfully you should see 5/5 successes in the JSON response eg

  "_shards": {
	"total": 5,
	"successful": 5,
	"skipped": 0,
	"failed": 0
  }

Grimmjow13r · March 22, 2018, 2:01pm

yeah successful: 1 total:1 no failed or skipped

Mark_Harwood · March 22, 2018, 2:14pm

So you only have one index and one shard? That should make life even easier - there shouldn't be any of the usual concerns over terms accuracy and increasing shard_size etc.

Two more questions - what elasticsearch version are you using and does it still fail to produce the correct results if you try remove the min_doc_count:1 parameter on your date_histogram agg?

Grimmjow13r · March 22, 2018, 2:20pm

es version is 5.2.2; Seems nothing changed when i removed min_doc_count.

Mark_Harwood · March 22, 2018, 2:40pm

So assuming we have a passing test (size:200) and a failing test (size:100) let's try and simplify the aggregation to compare the results of these collections.

Can you replace the date_histogram aggregation with a simple sum aggregation on the value field.
I'd like to know if the reported sums differ for the size:200 and size:100 queries. That should at least tell us if we're looking at the same set of docs/terms in the 2 queries.

Grimmjow13r · March 29, 2018, 9:42am

Thanks! "shard_size" helped in case of multiple indices.

system · April 26, 2018, 9:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Where did my aggregation data go! The case of the missing bucket Elasticsearch	1	712	July 5, 2017
Few buckets are missing in terms aggregation result Elasticsearch	1	394	February 24, 2020
Aggregation not hitting all shards (ElasticSearch 1.7.4) Elasticsearch	2	458	December 5, 2017
"missing" agg weird results Elasticsearch	2	339	February 5, 2020
Strange results when trying to run an aggregration query Elasticsearch	3	346	July 25, 2022

Aggregation is skipping needed data and not give expected results

Related topics