Aggregation across multiple indices missing match using w/wo kibana


(Noel) #1

Hi
I am using kibana on 2 indices each with one shrad.

On the x axis, I aggregate counts of a field, say field_A then subchart by field B. First index has 1 set of unique doc with field_A, second index has unique field_A and 2 field_B. Such as following:

First index:
doc:
_field_A:"mykey1"
_field_B:"mysubkeyI"

Second index:
doc:
_field_A:"mykey1"
_field_B:"mysubkeyII"
doc:
_field_A:"mykey1"
_field_B:"mysubkeyIII"

When every is working fine, the result would display, 3 docs.

Problem 1:
So, when I test to remove a doc, I expect the result would be 2 docs reply from the same aggregation query,

Expected result after removing a doc:
First index:
doc:
_field_A:"mykey1"
_field_B:"mysubkeyI"

Second index:
doc:
_field_A:"mykey1"
_field_B:"mysubkeyIII"

Instead, the result:
Second index:
doc:
_field_A:"mykey1"
_field_B:"mysubkeyIII"

Problem 2:
It also messed up 1 other result returning not just the one I expected to be missing.

What is wrong with aggregation across multiple indices?

Thanks a lot.


(Shaunak Kashyap) #2

Sorry, I didn't understand your question. Would you mind sharing some screenshots? That might help clarify the problem.


(Noel) #3

Hi
I verified the issue by coping kibana query and narrowed it down.
My doc looks like this,
doc
_fieldA
_fieldB
_timefield

In kibana,
It's a histogram with time range aggregation, (when you click on the top right ) (step*)
Then, I did X-axis aggregation by count of fieldA, (count20,accending) (step**)
Then subaggregate unique by fieldB.(count20,accending) (step ***)

Everytime, I ran the query, Kibana would complain the result set is too big to render but it eventually displays 20 bars as I only need to bottom 20 results (and on the y axis, it will split the bar as step***).

So, usually at step*, I gave a large time frame (approx 200k records) then step** aggregated them to around 1/3 of results set (60k) and pick the bottom 20 results.
So, returned results are messed up,

So, I give a narrow timeframe, which step* return less than 3000 records, step** aggregated it to around 1000 records and pick the bottom 20 results and I get my desired results.

This is the behavior I see in kibana, so, I took the query and reproduce the same issue in sense.

So, does aggregation not work well with large number of records? I have 2 indices and 1 shard per index.


(Spencer Alger) #4

The issue is probably the date histogram, not the terms or unique aggregations. I imagine two possibilities:

  1. the field you are using for your date histogram is not set as the time field for the Index Pattern. You assign this field when you create your Index Pattern, and the chosen field is the only one that is effected by the time filter.

    Since you can only have one timefield per index pattern you may need to create a new index pattern with a similar but slightly different pattern (try replacing one more of the letters in the pattern with a *)

  2. The date histogram is not using "auto" interval, so when the time range is large it creates a ton of small buckets. Auto isn't available unless you resolve #1.


(Noel) #5

Hi

I want to update you regarding this issue. Just in case anyone seeing the same.

The solution that seems to be working is using 1 index and 1 shard than using 2 indices (INDEXNAME_*) each with 1 shard. Since their data structure are same, just put them together as 1 index.
The result is accurate now.

Regarding the time field for date histogram, yeah I do have that and chose the right time field. And yeah it always works when the data size is small but when the amount of data increased, the multiple shards probably did not work well with aggregation.

Thanks


(system) #6