- Cluster - 4 nodes - 2*(master-no-data) + 2*(no-master-data);
- Data - not much - ~10000 small documents
- Uptime - ~month without restart, load - less 10 queries /index tasks per second
- OS - debian 8
Unexpectly cluster became hanged with _search endpoint
It keeps working with get document, it still work with _cat, but any _search hangs and freeze.
_cat/nodes - shown all nodes, _cat/health - all green.
In data-node's logs - no any errors, in master node - errors in transport with first data-node:
[2015-07-11 19:29:27,253][DEBUG][action.search.type ] [web1] [ullogin], node[5EQm6ReJRK6j6VByOis2og], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@14f73de] lastShard [true]
Restarting of 1-st data-node not resolve problem, restart of no-data nodes not resolve problem.
Cluster became valid ONLY just after restart 2-nd data-node.
So it's not much good situation that i have no any log info, no problem markers on _cat command, but cluster is unavailable by fact (cannot _search).
So the only option for now is to look over cluster manually and restart data nodes if they became invalid again.