ES Aggregation (Bug?) - No buckets results at high "min_doc_count" and low "size"

Sinmson · August 22, 2017, 10:39am

Hey,
I have 2 different types of documents.

Request
Response

They share the same sessionid.

As result I want the aggregated documened grouped by there sessionid. Thats what I did:

aggregations by terms sorting by the lowest timestamp
top_hits to include the _source + sorting the docs, newest first
min and max of @timestamp to sort the buckets later on

For this I wrote this aggregation:

Aggretation Query 1

Aggregation Query 2 - Result

As long as "min_doc_count" = 2 it works good. It shows the aggregated request and response and their source and this all sorted. But the result also contains aggregated buckets with a doc_count of 3. Those ones are failures, where the response server send a 2. response...why ever. To analyse this failure I want to show all buckets with minimum 3 docs in it -> i raised "min_doc_count" to 3.

Aggretaion Query 2

Well (or not well..), no there is no result, see for yourself:

Aggregation Query 2 - Result

And now it gets funny. I know that there have to be results 296 from this 296 elasticsearch should return 10 (because of size in terms).

I found multiple strange "solutions" but I do not know why they work.

I change the order property in the terms from ealiest_hit to "_terms"
=> returns 3 buckets
I change the order property in the terms from "ealiest_hit" to "_count"
=> returns as many buckets as I specified in "size" in "terms"
I change the field property in min / max from @timestamp to doc_count
=> returns 4 buckets
I increase size in terms
=> from 10 to 100 returns 2 buckets
=> from 10 to 1000 returns 81 buckets

I hope anyone can help me. Maybe my hole search query is wrong or you got a better idear, let me know. Is this a bug, should I open a issue at github?

Not related with the problem, but I also want to count the aggregated buckets. So that if I search for min_doc_count = 3, I want the first 10 aggregated buckets, but I also want the number of maximum buckets (in my case 296). Is this possible / how?

I hope this makes sense, if not please ask me what u did not get. It is very easy for me to change some values and test it, so if u got any idear, let me know

Thanks

ps first I had a character limi (23000 chars are a bit to much), haha

Mark_Harwood · August 22, 2017, 11:15am

The more of the statements which are true below, the harder the problem you face:

SessionID is a high cardinality field
You are using time-based indices
You have multiple shards
You are not routing documents using session ID.

If these are true it's not a great architecture/data model for attempting behavioural analytics.
You may want to check out entity-centric indexing

system · September 19, 2017, 11:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Min doc sub aggregation (find duplicates) Elasticsearch	1	482	October 7, 2017
Aggregations return result with sum_other_doc_count Elasticsearch	3	11410	July 5, 2017
Questions about aggregation min_doc_count = 0 Elasticsearch	3	1790	July 6, 2017
Returning bucket with 0 count in terms aggregation Elasticsearch	7	14288	July 11, 2017
Unusual aggregations size behaviour Elasticsearch	5	774	July 20, 2017

ES Aggregation (Bug?) - No buckets results at high "min_doc_count" and low "size"

Related topics