Estimating Filter Cache Needs


(Mike Sukmanowsky) #1

Is there an easy way (even if not entirely accurate) to estimate the size of an individual filter in the filter cache if we know the approximate number of documents the index holds? Realize it's a bit tricky as filter cache is node-level, not index-level by default.

If it were straight non-sparse bitset then a filter would be n-bits in size: 1B documents = 1B bits = 125MB but I'm guessing ES tries to use a more clever implementation of a bitset.


(Jason Wee) #2

do you mean for a filter query, how much heap of this query used in the node filter cache? if it is just the field occupied the heap, there are two https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-stats.html and https://www.elastic.co/guide/en/elasticsearch/reference/1.5/cluster-nodes-stats.html

hth

jason


(Mike Sukmanowsky) #3

I guess that's an easy way to estimate things but in our prod environment, we can't ensure that only one query is executing against the node unless we pull that node out of the cluster (which wouldn't be a great idea).

Doing some research, seems like ES uses a SparseFixedBitSet underneath the hood for caching filters. From the docs:

A bit set that only stores longs that have at least one bit which is set. The way it works is that the space of bits is divided into blocks of 4096 bits, which is 64 longs. Then for each block, we have:
a long[] which stores the non-zero longs for that block
a long so that bit i being set means that the i-th long of the block is non-null, and its offset in the array of longs is the number of one bits on the right of the i-th bit.

Which is a bit tricky to understand. Not exactly sure how to translate that to our case for an estimate.


(system) #4