I have an issue with refreshes in an ES cluster.
I have a process that does bulk indexing, and few other processes that do multiget requests. And there is an API that bulk writes some documents to ES and must ensure that data is searchable before it returns. The calls to the API are not very frequent, but it's quite important that they complete as soon as possible. To achieve the requirement that the data is searchable after the API call returns I tried various approaches, none of which worked well.
- adding refresh=true to bulk requests: the requests always time out if there is another indexing going on
- refreshing using the refresh API: refresh always times out (I even tried to refresh with curl and waited for up to an hour - the request would never complete!)
- doing a search using the ids query and making sure the newly written data is read (in a hope that automatic refresh helps) - it takes ridiculously long to get the fresh data back (more than 30s with refresh_interval set to 1s).
The refresh_interval is set to 1s (initially it was 5s, but switched to 1s to test out approach #3; it really makes no difference in the result). There are 1000 shards in the index spread over 170 nodes. As far as I can see, no process is CPU or disk-bound.
How do I find out what's happening? What could block forced refreshes and why refresh_interval is not tolerated?