Strange errors under very little read/write load when running as cluster but stable on a single node

HI,

I'm experiencing a very strange behaviour with my elasticsearch setup on
AWS.
It is a simple cluster with two nodes on seperate EC2 instances (tried
types: t2.micro, t2.small & m3.medium).

As long as i'm only doing write OR read operations everything is fine and i
get reasonable performance (my testscript causes just a few (~10) writes
per second and/or ~100-200 reads per seconds) - it becomes slow on the
micro instances but that is ok -> setup is just for development & testing.

When i start to do read & write in parallel things become very strange ->
the first 20-50 requests work fine, then it starts to behave strange, more
and more requests begin to hang and my log gets filled with the following
execptions:

org.elasticsearch.transport.ReceiveTimeoutTransportException: [Venus][inet[/
10.1.128.241:9300]][cluster:monitor/nodes/stats[n]] request_id [763838]
timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 13:55:45,681][DEBUG][action.admin.cluster.node.stats] [Achelous]
failed to execute on node [YsaywggSSv-IPGdIpV9UVw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Venus][inet[/
10.1.128.241:9300]][cluster:monitor/nodes/stats[n]] request_id [764432]
timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 13:57:45,682][DEBUG][action.admin.cluster.node.stats] [Achelous]
failed to execute on node [YsaywggSSv-IPGdIpV9UVw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Venus][inet[/
10.1.128.241:9300]][cluster:monitor/nodes/stats[n]] request_id [765188]
timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:00:15,683][DEBUG][action.admin.cluster.node.stats] [Achelous]
failed to execute on node [YsaywggSSv-IPGdIpV9UVw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Venus][inet[/
10.1.128.241:9300]][cluster:monitor/nodes/stats[n]] request_id [765704]
timed out after [15001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:02:44,584][INFO ][cluster.service ] [Achelous]
removed {[Venus][YsaywggSSv-IPGdIpV9UVw][46cf539b08bf][inet[/10.1.128.241:
9300]],}, reason: zen-disco-node_failed([Venus][YsaywggSSv-IPGdIpV9UVw][
46cf539b08bf][inet[/10.1.128.241:9300]]), reason transport disconnected
[2015-03-11 14:02:44,589][DEBUG][action.search.type ] [Achelous] [
19353] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [19353]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:
508)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.
java:478)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(
SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:02:44,596][DEBUG][action.search.type ] [Achelous] [
19368] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [19368]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:
508)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.
java:478)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(
SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:02:44,602][DEBUG][action.search.type ] [Achelous] [
19403] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [19403]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:
508)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.
java:478)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(
SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:02:44,603][DEBUG][action.search.type ] [Achelous] [
19404] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [19404]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:
508)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.
java:478)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(
SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(
SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 14:02:44,605][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,606][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,606][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,606][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,614][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,615][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,615][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,615][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,615][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,703][DEBUG][action.admin.indices.stats] [Achelous] [
attendee-dev][3], node[YsaywggSSv-IPGdIpV9UVw], [R], s[STARTED]: failed to
execute [org.elasticsearch.action.admin.indices.stats.
IndicesStatsRequest@2f5eeaae]
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:monitor/stats[s]] disconnected
[2015-03-11 14:02:44,706][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.SendRequestTransportException: [Venus][inet[/
10.1.128.241:9300]][indices:data/read/search[free_context]]
at org.elasticsearch.transport.TransportService.sendRequest(
TransportService.java:213)
at org.elasticsearch.transport.TransportService.sendRequest(
TransportService.java:183)
at org.elasticsearch.search.action.SearchServiceTransportAction.
sendFreeContext(SearchServiceTransportAction.java:143)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction.releaseIrrelevantSearchContexts(
TransportSearchTypeAction.java:341)
at org.elasticsearch.action.search.type.
TransportSearchQueryThenFetchAction$AsyncAction$2.run(
TransportSearchQueryThenFetchAction.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [Venus][
inet[/10.1.128.241:9300]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(
NettyTransport.java:946)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(
NettyTransport.java:640)
at org.elasticsearch.transport.TransportService.sendRequest(
TransportService.java:199)
... 7 more
[2015-03-11 14:02:44,706][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.NodeDisconnectedException: [Venus][inet[/10.1.
128.241:9300]][indices:data/read/search[free_context]] disconnected
[2015-03-11 14:02:44,675][WARN ][search.action ] [Achelous]
Failed to send release search context
org.elasticsearch.transport.SendRequestTransportException: [Venus][inet[/
10.1.128.241:9300]][indices:data/read/search[free_context]]
at org.elasticsearch.transport.TransportService.sendRequest(
TransportService.java:213)
at org.elasticsearch.transport.TransportService.sendRequest(
TransportService.java:183)
at org.elasticsearch.search.action.SearchServiceTransportAction.
sendFreeContext(SearchServiceTransportAction.java:143)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction.releaseIrrelevantSearchContexts(
TransportSearchTypeAction.java:341)
at org.elasticsearch.action.search.type.
TransportSearchQueryThenFetchAction$AsyncAction$2.run(
TransportSearchQueryThenFetchAction.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [Venus][
inet[/10.1.128.241:9300]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(
NettyTransport.java:946)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(
NettyTransport.java:640)

The really strange thing is that this does not happen when running a single
instance of elasticsearch - it gets slow (as expected) but no errors and no
hangs.
The errors begin to occur at a system load of about 0.6-1.2 -> my single
node test caused loads of 8-10+ and did not cause errors... also using
faster EC2 instacnes m3.medium etc. did not change anything -> so
CPU/Memory can't be the reason - the data set itself is tiny (~5000
documents in 3 indices)

If anyone here has an idea about what may cause this issue please, please
let me know - i already spent a whole day trying to track this issue down
by trying differen ES version config settings and so on.

thank you very much in advance,

kind regards,
Christian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55b06673-92c6-4c10-ba36-86699e3249de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.