Stream indexing and querying filter caching


(Ilija Subasic) #1

Hi,
We have a system which does stream indexing for an online system that has a large number of filtered queries. As filter scope changes it looks like there is a lot of filter evictions going on. I have a couple of questions if anyone can help:

a) is filter rebuild with each newly indexed file that fits into the scope? does this happen on querey time?
b) what is the order of filter eviction (e.g. FIFO) ?
c) does anyone have experience with preformance with manually contorling some filter?
d) what is the cost of continiously adding documents to filters (e.g. imagine that I have a field "document_type" filter and that every second I index 100 new documents for a type)?

Any help would be highly welcomed. I browsed a code a bit, but am not 100% sure I get everything about filter caching and building.

Best,
Ilija


(Mark Walkom) #2

Only when a segment is created/merged.

Last recently used.

You can tune some things but you need to be careful as you can end up causing more issues. Can you increase heap?

See 1; you probably wouldn't notice it under normal loads though.


(Ilija Subasic) #3

Thanks for the answer.

What is the worst that can happen, how much does it cost to build an index?

We are already at 32 GB, so guess we could add more nodes. But most filters are used only once (e.g. time filters from the query moment to some predefined).

We have waves of indexing 10k+ a second, normally it is in 100s. Plus each indexing event is followed by a number of queries.


(system) #4