Elasticsearch unresponsive with too many users

Elastichsearch randomly becomes unresponsive with too many requests ( aprox. 250 requests per second) and I don't have any idea why.

Only the "indice" index is being used.

Tried increasing the node heap size to no avail.

_cat/shards

indice_sessao 2 p STARTED 3 39.8kb 127.0.0.1 Count Abyss
indice_sessao 2 r UNASSIGNED
indice_sessao 1 p STARTED 1 7.7kb 127.0.0.1 Count Abyss
indice_sessao 1 r UNASSIGNED
indice_sessao 4 p STARTED 1 7.7kb 127.0.0.1 Count Abyss
indice_sessao 4 r UNASSIGNED
indice_sessao 3 p STARTED 0 160b 127.0.0.1 Count Abyss
indice_sessao 3 r UNASSIGNED
indice_sessao 0 p STARTED 3 18.3kb 127.0.0.1 Count Abyss
indice_sessao 0 r UNASSIGNED
indice 1 p STARTED 1512 4.1mb 127.0.0.1 Count Abyss
indice 1 r UNASSIGNED
indice 2 p STARTED 1539 7.9mb 127.0.0.1 Count Abyss
indice 2 r UNASSIGNED
indice 4 p STARTED 1558 9.1mb 127.0.0.1 Count Abyss
indice 4 r UNASSIGNED
indice 3 p STARTED 1502 8.3mb 127.0.0.1 Count Abyss
indice 3 r UNASSIGNED
indice 0 p STARTED 1518 8.6mb 127.0.0.1 Count Abyss
indice 0 r UNASSIGNED
pleasereadthis 1 p STARTED 0 160b 127.0.0.1 Count Abyss
pleasereadthis 1 r UNASSIGNED
pleasereadthis 2 p STARTED 0 160b 127.0.0.1 Count Abyss
pleasereadthis 2 r UNASSIGNED
pleasereadthis 4 p STARTED 0 160b 127.0.0.1 Count Abyss
pleasereadthis 4 r UNASSIGNED
pleasereadthis 3 p STARTED 0 160b 127.0.0.1 Count Abyss
pleasereadthis 3 r UNASSIGNED
pleasereadthis 0 p STARTED 0 160b 127.0.0.1 Count Abyss
pleasereadthis 0 r UNASSIGNED

_cat/shards?h=index,shard,prirep,state,unassigned.reason

indice_sessao 2 p STARTED
indice_sessao 2 r UNASSIGNED CLUSTER_RECOVERED
indice_sessao 1 p STARTED
indice_sessao 1 r UNASSIGNED CLUSTER_RECOVERED
indice_sessao 4 p STARTED
indice_sessao 4 r UNASSIGNED CLUSTER_RECOVERED
indice_sessao 3 p STARTED
indice_sessao 3 r UNASSIGNED CLUSTER_RECOVERED
indice_sessao 0 p STARTED
indice_sessao 0 r UNASSIGNED CLUSTER_RECOVERED
indice 1 p STARTED
indice 1 r UNASSIGNED CLUSTER_RECOVERED
indice 2 p STARTED
indice 2 r UNASSIGNED CLUSTER_RECOVERED
indice 4 p STARTED
indice 4 r UNASSIGNED CLUSTER_RECOVERED
indice 3 p STARTED
indice 3 r UNASSIGNED CLUSTER_RECOVERED
indice 0 p STARTED
indice 0 r UNASSIGNED CLUSTER_RECOVERED
pleasereadthis 1 p STARTED
pleasereadthis 1 r UNASSIGNED CLUSTER_RECOVERED
pleasereadthis 2 p STARTED
pleasereadthis 2 r UNASSIGNED CLUSTER_RECOVERED
pleasereadthis 4 p STARTED
pleasereadthis 4 r UNASSIGNED CLUSTER_RECOVERED
pleasereadthis 3 p STARTED
pleasereadthis 3 r UNASSIGNED CLUSTER_RECOVERED
pleasereadthis 0 p STARTED
pleasereadthis 0 r UNASSIGNED CLUSTER_RECOVERED

_nodes

{"cluster_name":"elasticsearch","nodes":{"nru50EUwRxako9cT2fhdpQ":{"name":"Count Abyss","transport_address":"127.0.0.1:9300","host":"127.0.0.1","ip":"127.0.0.1","version":"2.4.0","build":"ce9f0c7","http_address":"127.0.0.1:9200","settings":{"bootstrap":{"memory_lock":"true"},"client":{"type":"node"},"name":"Count Abyss","pidfile":"/var/run/elasticsearch/elasticsearch.pid","path":{"data":"/var/lib/elasticsearch","home":"/usr/share/elasticsearch","conf":"/etc/elasticsearch","logs":"/var/log/elasticsearch"},"cluster":{"name":"elasticsearch"},"config":{"ignore_system_properties":"true"},"indices":{"cache":{"query":{"size":"2%"}}},"script":{"indexed":"on","engine":{"groovy":{"inline":{"aggs":"on"}}},"inline":"true"},"foreground":"false"},"os":{"refresh_interval_in_millis":1000,"name":"Linux","arch":"amd64","version":"4.4.19-29.55.amzn1.x86_64","available_processors":40,"allocated_processors":32},"process":{"refresh_interval_in_millis":1000,"id":50050,"mlockall":true},"jvm":{"pid":50050,"version":"1.7.0_111","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"24.111-b01","vm_vendor":"Oracle Corporation","start_time_in_millis":1505831346434,"mem":{"heap_init_in_bytes":10737418240,"heap_max_in_bytes":10493165568,"non_heap_init_in_bytes":24313856,"non_heap_max_in_bytes":224395264,"direct_max_in_bytes":10493165568},"gc_collectors":["ParNew","ConcurrentMarkSweep"],"memory_pools":["Code Cache","Par Eden Space","Par Survivor Space","CMS Old Gen","CMS Perm Gen"],"using_compressed_ordinary_object_pointers":"true"},"thread_pool":{"generic":{"type":"cached","keep_alive":"30s","queue_size":-1},"index":{"type":"fixed","min":32,"max":32,"queue_size":200},"fetch_shard_store":{"type":"scaling","min":1,"max":64,"keep_alive":"5m","queue_size":-1},"get":{"type":"fixed","min":32,"max":32,"queue_size":1000},"snapshot":{"type":"scaling","min":1,"max":5,"keep_alive":"5m","queue_size":-1},"force_merge":{"type":"fixed","min":1,"max":1,"queue_size":-1},"suggest":{"type":"fixed","min":32,"max":32,"queue_size":1000},"bulk":{"type":"fixed","min":32,"max":32,"queue_size":50},"warmer":{"type":"scaling","min":1,"max":5,"keep_alive":"5m","queue_size":-1},"flush":{"type":"scaling","min":1,"max":5,"keep_alive":"5m","queue_size":-1},"search":{"type":"fixed","min":49,"max":49,"queue_size":1000},"fetch_shard_started":{"type":"scaling","min":1,"max":64,"keep_alive":"5m","queue_size":-1},"listener":{"type":"fixed","min":10,"max":10,"queue_size":-1},"percolate":{"type":"fixed","min":32,"max":32,"queue_size":1000},"refresh":{"type":"scaling","min":1,"max":10,"keep_alive":"5m","queue_size":-1},"management":{"type":"scaling","min":1,"max":5,"keep_alive":"5m","queue_size":-1}},"transport":{"bound_address":["127.0.0.1:9300","[::1]:9300"],"publish_address":"127.0.0.1:9300","profiles":{}},"http":{"bound_address":["127.0.0.1:9200","[::1]:9200"],"publish_address":"127.0.0.1:9200","max_content_length_in_bytes":104857600},"plugins":[],"modules":[{"name":"lang-expression","version":"2.4.0","description":"Lucene expressions integration for Elasticsearch","jvm":true,"classname":"org.elasticsearch.script.expression.ExpressionPlugin","isolated":true,"site":false},{"name":"lang-groovy","version":"2.4.0","description":"Groovy scripting integration for Elasticsearch","jvm":true,"classname":"org.elasticsearch.script.groovy.GroovyPlugin","isolated":true,"site":false},{"name":"reindex","version":"2.4.0","description":"_reindex and _update_by_query APIs","jvm":true,"classname":"org.elasticsearch.index.reindex.ReindexPlugin","isolated":true,"site":false}]}}}[

Part of log file from today

[2017-09-19 10:18:11,031][INFO ][node ] [Ozone] stopping ...
[2017-09-19 10:18:11,301][INFO ][node ] [Ozone] stopped
[2017-09-19 10:18:11,301][INFO ][node ] [Ozone] closing ...
[2017-09-19 10:18:11,311][INFO ][node ] [Ozone] closed
[2017-09-19 10:18:16,216][INFO ][node ] [Thunderbird] version[2.4.0], pid[39862], build[ce9f0c7/2016-08-29T09:14:17Z]
[2017-09-19 10:18:16,216][INFO ][node ] [Thunderbird] initializing ...
[2017-09-19 10:18:16,679][INFO ][plugins ] [Thunderbird] modules [lang-groovy, reindex, lang-expression], plugins [], sites []
[2017-09-19 10:18:16,702][INFO ][env ] [Thunderbird] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [31.6gb], net total_space [49gb], spins? [no], types [ext4]
[2017-09-19 10:18:16,702][INFO ][env ] [Thunderbird] heap size [9.7gb], compressed ordinary object pointers [true]
[2017-09-19 10:18:18,324][INFO ][node ] [Thunderbird] initialized
[2017-09-19 10:18:18,325][INFO ][node ] [Thunderbird] starting ...
[2017-09-19 10:18:18,469][INFO ][transport ] [Thunderbird] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2017-09-19 10:18:18,473][INFO ][discovery ] [Thunderbird] elasticsearch/qzls17hQRvCRWaeJMzusTg
[2017-09-19 10:18:21,506][INFO ][cluster.service ] [Thunderbird] new_master {Thunderbird}{qzls17hQRvCRWaeJMzusTg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-09-19 10:18:21,548][INFO ][http ] [Thunderbird] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
[2017-09-19 10:18:21,549][INFO ][node ] [Thunderbird] started
[2017-09-19 10:18:21,583][DEBUG][action.search ] [Thunderbird] All shards failed for phase: [query]
[indice][[indice][4]] NoShardAvailableActionException[null]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:129)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:115)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:47)
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:149)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:137)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:85)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:88)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:582)
at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:85)
at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54)
at org.elasticsearch.rest.RestController.executeHandler(RestController.java:198)
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:158)
at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:153)
at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:101)
at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:451)
at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:61)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jbos

As the shards for that index are quite small, you may be able to handle more concurrent queries on your node if you shrink that index down to 1 primary shard, either by reindexing or using the shrink index API.

1 Like

How can I limit the number of shards before reindexing ?

Is there a possibility that these unassigned replica shards can be contributing to these problems ?

Thank you.

As it looks like you only have 1 server, all replica shards will always be unassigned. You can control the n umber of primary shards through an index template.

1 Like

Thanks. Just one more question: Why would it be able to handle more concurrent queries with only one shard ?

A task will basically be created per shard and query, so by having fewer tasks your queues will not fill up as quickly. Larger shards could potentially result in higher latencies, but that could be offset by better query throughput. If you still need higher query throughput, you can scale up/out your cluster.

1 Like

Thank you. There are lots of queries ( hence the 250QPS) but document storing occurs very rarely, should I take it into consideration when scaling the cluster or configuring it ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.