we have a large cluster (50TB) that doesn't require near realtime search accuracy, and we want to boost up our indexing performance there, so we thought we should increase the refresh_interval for our indices.
We wanted to know if it has any flaws we should consider, for example: (let's say we set it to 15m)
- we happens if a node fails ? will we have a data loss?
- what happens when all indices actually refresh ? will we have a huge load at that time?
setting the refresh interval to a higher number won't have any implication on data safety. it's really just there for visibility from the search / retrieval perspective. Yet, it will hold on to on-disc and in-memory data-structures until you
refresh again that will release them. That might have some space implication on disk that you wanna consider?! When you refresh after a longer period it will need to do some heavy lifting but it's not considered huge load. Yet, there is some usecase dependency here maybe you start with raising it to
30sec and see how it goes. I suspect it to be similar to
15min to be honest.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.