Just some general recommendations on an index that is constantly being updated, and need good search performance?
We might be indexing a few hundred documents in ES every 5 minutes. Will the caching be severely impacted for filters and facets? We have 5 shards, and I was thinking of increasing the refresh rate to 30s, and maybe adding more shards to not have caching issues as much (since the shards will be across more nodes).
Without knowing your queries, it is impossible to answer precisely with good help.
So here is general advise with some general rules.
If you create aggregations over old and new documents together while index is busy with updating, you will always invalidate previous aggregations, and caches will be rebuilt, no matter if you have 1, 5, 10, or more shards - the more shards, the more work to do.
If your queries and index updates are proportional to the given time frame, it is possible to organize time frame indices. Create as much data as possible in immutable indices, and set a filter in your queries to focus on that old data. On those immutable indices, you can aggregate without cache invalidations. You can also force_merge on old indices to gain more search performance.
For a few hundred documents every 5 minutes, there is not much gain to set the index refresh interval to 30 seconds. Increasing refresh intervals helps on massive bulk rates of several thousands documents per second. You should keep an eye on the document indexing to collect them in as few bulk requests as possible and send them in large intervals (e.g. every 5 minutes). This saves Elasticsearch from creating many small segments that would have to be merged later.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.