ES Aggregation (Bug?) - No buckets results at high "min_doc_count" and low "size"


(Tim Pütz) #1

Hey,
I have 2 different types of documents.

  • Request
  • Response

They share the same sessionid.

As result I want the aggregated documened grouped by there sessionid. Thats what I did:

  1. aggregations by terms sorting by the lowest timestamp
  2. top_hits to include the _source + sorting the docs, newest first
  3. min and max of @timestamp to sort the buckets later on

For this I wrote this aggregation:

Aggretation Query 1

Aggregation Query 2 - Result

As long as "min_doc_count" = 2 it works good. It shows the aggregated request and response and their source and this all sorted. But the result also contains aggregated buckets with a doc_count of 3. Those ones are failures, where the response server send a 2. response...why ever. To analyse this failure I want to show all buckets with minimum 3 docs in it -> i raised "min_doc_count" to 3.

Aggretaion Query 2

Well (or not well..), no there is no result, see for yourself:

Aggregation Query 2 - Result

And now it gets funny. I know that there have to be results 296 from this 296 elasticsearch should return 10 (because of size in terms).

I found multiple strange "solutions" but I do not know why they work.

  1. I change the order property in the terms from ealiest_hit to "_terms"
    => returns 3 buckets
  2. I change the order property in the terms from "ealiest_hit" to "_count"
    => returns as many buckets as I specified in "size" in "terms"
  3. I change the field property in min / max from @timestamp to doc_count
    => returns 4 buckets
  4. I increase size in terms
    => from 10 to 100 returns 2 buckets
    => from 10 to 1000 returns 81 buckets

I hope anyone can help me. Maybe my hole search query is wrong or you got a better idear, let me know. Is this a bug, should I open a issue at github?

Not related with the problem, but I also want to count the aggregated buckets. So that if I search for min_doc_count = 3, I want the first 10 aggregated buckets, but I also want the number of maximum buckets (in my case 296). Is this possible / how?

I hope this makes sense, if not please ask me what u did not get. It is very easy for me to change some values and test it, so if u got any idear, let me know :slight_smile:

Thanks

ps first I had a character limi (23000 chars are a bit to much), haha


(Mark Harwood) #2

The more of the statements which are true below, the harder the problem you face:

  • SessionID is a high cardinality field
  • You are using time-based indices
  • You have multiple shards
  • You are not routing documents using session ID.

If these are true it's not a great architecture/data model for attempting behavioural analytics.
You may want to check out entity-centric indexing


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.