We have a cluster of 4 nodes, each with 5 shards and 1 replica/shard.
We have two clients trying to index a lot of documents. One of the
clients had successfully indexed >100 documents but the second client
hung on the first document. From the logs, we can see that the Master
node had tried to replicate this to our fourth node. The fourth node
had the IndexMissingException and hung (we waited > 7 minutes). The
client that requested the indexing also hung and did not timeout
either. We've reproduced this case several times (starting with a
clean work folder).
A note about how we're indexing: we're using the documents publication
date as the index so our index will grow in the future (as recommended
in http://elasticsearch-users.115913.n3.nabble.com/Changing-the-number-of-shards-and-replicas-of-an-existing-index-td413204.html).
Is this an issue on the server side?
Config & log details below:
Client side code to connect to the cluster:
Map<String,String> m = new HashMap<String,String>();
m.put("cluster.name", "myCluster");
Settings s =
ImmutableSettings.settingsBuilder().put(m).build();
// FIXME: get hosts from configuration
// TODO: check if client API is thread safe
client = new TransportClient(s)
.addTransportAddress(new
InetSocketTransportAddress("10.177.163.72", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.65", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.57", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.164.95", 9300));
client side indexing code:
String index = new
SimpleDateFormat("yyyyMMdd").format(doc.getPublishedTime());
String type = lang.getName();
String id = doc.getFeedId();
client.prepareIndex(index, type, id)
.setOperationThreaded(false)
.setSource(jsonBuilder()
.startObject()
.field("id", doc.getFeedId())
.field("publishDate",
doc.getPublishedTime())
.field("content", doc.getContent())
.endObject()
)
.execute()
.actionGet();
elasticsearch.yml
cluster:
name: myCluster
network:
publish_host: 10.177.163.72
discovery.zen.ping.unicast:
hosts: ["10.177.163.72:9300", "10.177.162.65:9300",
"10.177.162.57:9300", "10.177.164.95:9300"]
Master (Server 1)
[2010-11-11 10:16:54,340][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initializing ...
[2010-11-11 10:16:54,350][INFO ][plugins ] [Xorr the
God-Jewel] loaded []
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initialized
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: starting ...
[2010-11-11 10:16:56,150][INFO ][transport ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.177.163.72:9300]}
[2010-11-11 10:16:59,410][INFO ][cluster.service ] [Xorr the
God-Jewel] new_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], reason: zen-disco-join
(elected_as_master)
[2010-11-11 10:16:59,420][INFO ][discovery ] [Xorr the
God-Jewel] SocialMetrix/QQho7EckQoOet726kVPG_Q
[2010-11-11 10:16:59,430][INFO ][http ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.177.163.72:9200]}
[2010-11-11 10:16:59,430][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: started
[2010-11-11 10:17:07,080][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Arlok][79lzFsNVRZWTJBOAm81PYw][inet[/
10.177.162.65:9300]],}, reason: zen-disco-receive(from node[[Arlok]
[79lzFsNVRZWTJBOAm81PYw][inet[/10.177.162.65:9300]]])
[2010-11-11 10:17:13,460][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Blaze, Siena][2reBSun3RZq-qCqOrnhQiw][inet[/
10.177.162.57:9300]],}, reason: zen-disco-receive(from node[[Blaze,
Siena][2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]]])
[2010-11-11 10:17:20,440][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Sefton, Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/
10.177.164.95:9300]],}, reason: zen-disco-receive(from node[[Sefton,
Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/10.177.164.95:9300]]])
[2010-11-11 10:20:27,380][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:28,410][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] created and added to cluster_state
[2010-11-11 10:20:28,900][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] update_mapping [en] (dynamic)
... successfully processed others
[2010-11-11 10:20:40,730][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:40,740][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081027] update_mapping [es] (dynamic)
[2010-11-11 10:20:40,890][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] created and added to cluster_state
[2010-11-11 10:20:40,960][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] update_mapping [es] (dynamic)
... continued processing others
Slave (Server 4)
[2010-11-11 10:17:15,410][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initializing ...
[2010-11-11 10:17:15,410][INFO ][plugins ] [Sefton,
Amanda] loaded []
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initialized
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: starting ...
[2010-11-11 10:17:17,190][INFO ][transport ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.177.164.95:9300]}
[2010-11-11 10:17:20,450][INFO ][cluster.service ] [Sefton,
Amanda] detected_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], added {[Arlok][79lzFsNVRZWTJBOAm81PYw]
[inet[/10.177.162.65:9300]],[Xorr the God-Jewel]
[QQho7EckQoOet726kVPG_Q][inet[/10.177.163.72:9300]],[Blaze, Siena]
[2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]],}, reason: zen-
disco-receive(from [[Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q][inet[/
10.177.163.72:9300]]])
[2010-11-11 10:17:20,450][INFO ][discovery ] [Sefton,
Amanda] SocialMetrix/luCj2bn8R6GVSAk0AJSYfQ
[2010-11-11 10:17:20,460][INFO ][http ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.177.164.95:9200]}
[2010-11-11 10:17:20,460][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: started
[2010-11-11 10:20:40,730][ERROR][transport.netty ] [Sefton,
Amanda] Failed to handle exception response
org.elasticsearch.indices.IndexMissingException: [20081016] missing
at
org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:
178)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
232)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
212)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:
93)
at org.elasticsearch.action.index.TransportIndexAction.access
$101(TransportIndexAction.java:62)
at org.elasticsearch.action.index.TransportIndexAction
$1.onFailure(TransportIndexAction.java:99)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction
$3.handleException(TransportMasterNodeOperationAction.java:175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$2.run(MessageChannelHandler.java:169)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Process info on Server 4 (Sometimes when the server gets in this
state, the CPU climbs and stays ~80%)
root 5404 35.8 61.9 1657240 1304856 ? Sl 10:17 24:01 /usr/
lib/jvm/java-6-sun/jre/bin/java -Delasticsearch-service -Des-
foreground=yes -Des.path.home=/opt/elasticsearch -Djline.enabled=true -
XX:+AggressiveOpts -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError -Xms256m -
Xmx1024m -Djava.library.path=/opt/elasticsearch/bin/service/lib -
classpath /opt/elasticsearch/bin/service/lib/wrapper.jar:/opt/
elasticsearch/lib/elasticsearch-0.12.0.jar:/opt/elasticsearch/lib/
jline-0.9.94.jar:/opt/elasticsearch/lib/log4j-1.2.15.jar:/opt/
elasticsearch/lib/lucene-analyzers-3.0.2.jar:/opt/elasticsearch/lib/
lucene-core-3.0.2.jar:/opt/elasticsearch/lib/lucene-fast-vector-
highlighter-3.0.2.jar:/opt/elasticsearch/lib/lucene-queries-3.0.2.jar:/
opt/elasticsearch/lib/sigar/sigar-1.6.3.jar -
Dwrapper.key=T9MeKiQTKVe0hkUp -Dwrapper.port=32000 -
Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -
Dwrapper.disable_console_input=TRUE -Dwrapper.pid=5402 -
Dwrapper.version=3.4.0 -Dwrapper.native_library=wrapper -
Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1
org.tanukisoftware.wrapper.WrapperSimpleApp
org.elasticsearch.bootstrap.Bootstrap
The servers have 2 gigs of physical memory and we're giving
elasticsearch 1 gig so we don't expect memory to be an issue.