Top3 in Line graph and Top values in Available fields are different

It is calculated over the time period, but separately per node in the cluster (actually per shard of the queried index pattern).

More thorough explanation: Elasticsearch splits up data across multiple indices and shards sitting on different nodes - every shard is doing it's own processing, then sending the result to the coordinating node (the one Kibana talks to) which merges the individual nodes results and sends the response to Kibana. However ordering a list of terms can't be distributed across multiple nodes.

One option would be for each node to send the full list of local terms to the coordinating node which merges and orders all of these lists, then sends the top 3 to the client. However, if there are millions of terms in total, this would be super expensive as a lot of data would have to be transferred to the coordinating node (and it would also use up a lot of memory on the coordinating node).

So Elasticsearch isn't doing this, instead it just sends the top 15 terms or so per shard to the coordinating node. This keeps the memory usage and network traffic low, but it means the list can be wrong.

Consider the following example:

node one has the following data:

term count
A 95
B 94
C 93
X 1
Y 2
Z 3

node two has the following data:

term count
A 9
B 8
C 7
X 98
Y 97
Z 96

Both nodes send their top 3 (A-B for node one and X-Z for node two) to the coordinating node which merges and sorts the partial lists and sends the top 3 of that list to the client:

combined top 3 lists from both nodes

term count
X 98
Y 97
Z 96
A 95
B 94
C 93

User sees:

term count
X 98
Y 97
Z 96

However, if the data nodes had sent their top 6 lists respectively, the outcome would have been very different:

term count
A 95 + 9 = 104
B 94 + 8 = 102
C 93 + 7 = 100
X 1 + 98 = 99
Y 2 + 97 = 99
Z 3 + 96 = 99

So the client would eventually see

term count
A 104
B 102
C 100

This is what the "accuracy mode" is about - it increases the number of terms transferred from the data nodes to the coordinating node which is more expensive to calculate, but there is a smaller chance of providing the wrong top values

2 Likes