IndexMissingException on Client.prepareIndex call


(ChrisM) #1

We have a cluster of 4 nodes, each with 5 shards and 1 replica/shard.
We have two clients trying to index a lot of documents. One of the
clients had successfully indexed >100 documents but the second client
hung on the first document. From the logs, we can see that the Master
node had tried to replicate this to our fourth node. The fourth node
had the IndexMissingException and hung (we waited > 7 minutes). The
client that requested the indexing also hung and did not timeout
either. We've reproduced this case several times (starting with a
clean work folder).

A note about how we're indexing: we're using the documents publication
date as the index so our index will grow in the future (as recommended
in http://elasticsearch-users.115913.n3.nabble.com/Changing-the-number-of-shards-and-replicas-of-an-existing-index-td413204.html).

Is this an issue on the server side?

Config & log details below:

Client side code to connect to the cluster:
Map<String,String> m = new HashMap<String,String>();
m.put("cluster.name", "myCluster");
Settings s =
ImmutableSettings.settingsBuilder().put(m).build();

           // FIXME: get hosts from configuration
           // TODO: check if client API is thread safe
           client = new TransportClient(s)
       .addTransportAddress(new

InetSocketTransportAddress("10.177.163.72", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.65", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.57", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.164.95", 9300));

client side indexing code:
String index = new
SimpleDateFormat("yyyyMMdd").format(doc.getPublishedTime());
String type = lang.getName();
String id = doc.getFeedId();

                   client.prepareIndex(index, type, id)
                   .setOperationThreaded(false)
                   .setSource(jsonBuilder()
                               .startObject()
                               .field("id", doc.getFeedId())
                               .field("publishDate",

doc.getPublishedTime())
.field("content", doc.getContent())
.endObject()
)
.execute()
.actionGet();

elasticsearch.yml
cluster:
name: myCluster
network:
publish_host: 10.177.163.72
discovery.zen.ping.unicast:
hosts: ["10.177.163.72:9300", "10.177.162.65:9300",
"10.177.162.57:9300", "10.177.164.95:9300"]

Master (Server 1)
[2010-11-11 10:16:54,340][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initializing ...
[2010-11-11 10:16:54,350][INFO ][plugins ] [Xorr the
God-Jewel] loaded []
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initialized
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: starting ...
[2010-11-11 10:16:56,150][INFO ][transport ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.177.163.72:9300]}
[2010-11-11 10:16:59,410][INFO ][cluster.service ] [Xorr the
God-Jewel] new_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], reason: zen-disco-join
(elected_as_master)
[2010-11-11 10:16:59,420][INFO ][discovery ] [Xorr the
God-Jewel] SocialMetrix/QQho7EckQoOet726kVPG_Q
[2010-11-11 10:16:59,430][INFO ][http ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.177.163.72:9200]}
[2010-11-11 10:16:59,430][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: started
[2010-11-11 10:17:07,080][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Arlok][79lzFsNVRZWTJBOAm81PYw][inet[/
10.177.162.65:9300]],}, reason: zen-disco-receive(from node[[Arlok]
[79lzFsNVRZWTJBOAm81PYw][inet[/10.177.162.65:9300]]])
[2010-11-11 10:17:13,460][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Blaze, Siena][2reBSun3RZq-qCqOrnhQiw][inet[/
10.177.162.57:9300]],}, reason: zen-disco-receive(from node[[Blaze,
Siena][2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]]])
[2010-11-11 10:17:20,440][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Sefton, Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/
10.177.164.95:9300]],}, reason: zen-disco-receive(from node[[Sefton,
Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/10.177.164.95:9300]]])
[2010-11-11 10:20:27,380][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:28,410][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] created and added to cluster_state
[2010-11-11 10:20:28,900][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] update_mapping [en] (dynamic)
... successfully processed others
[2010-11-11 10:20:40,730][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:40,740][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081027] update_mapping [es] (dynamic)
[2010-11-11 10:20:40,890][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] created and added to cluster_state
[2010-11-11 10:20:40,960][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] update_mapping [es] (dynamic)
... continued processing others

Slave (Server 4)
[2010-11-11 10:17:15,410][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initializing ...
[2010-11-11 10:17:15,410][INFO ][plugins ] [Sefton,
Amanda] loaded []
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initialized
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: starting ...
[2010-11-11 10:17:17,190][INFO ][transport ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.177.164.95:9300]}
[2010-11-11 10:17:20,450][INFO ][cluster.service ] [Sefton,
Amanda] detected_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], added {[Arlok][79lzFsNVRZWTJBOAm81PYw]
[inet[/10.177.162.65:9300]],[Xorr the God-Jewel]
[QQho7EckQoOet726kVPG_Q][inet[/10.177.163.72:9300]],[Blaze, Siena]
[2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]],}, reason: zen-
disco-receive(from [[Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q][inet[/
10.177.163.72:9300]]])
[2010-11-11 10:17:20,450][INFO ][discovery ] [Sefton,
Amanda] SocialMetrix/luCj2bn8R6GVSAk0AJSYfQ
[2010-11-11 10:17:20,460][INFO ][http ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.177.164.95:9200]}
[2010-11-11 10:17:20,460][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: started
[2010-11-11 10:20:40,730][ERROR][transport.netty ] [Sefton,
Amanda] Failed to handle exception response
org.elasticsearch.indices.IndexMissingException: [20081016] missing
at
org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:
178)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
232)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
212)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:
93)
at org.elasticsearch.action.index.TransportIndexAction.access
$101(TransportIndexAction.java:62)
at org.elasticsearch.action.index.TransportIndexAction
$1.onFailure(TransportIndexAction.java:99)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction
$3.handleException(TransportMasterNodeOperationAction.java:175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$2.run(MessageChannelHandler.java:169)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Process info on Server 4 (Sometimes when the server gets in this
state, the CPU climbs and stays ~80%)
root 5404 35.8 61.9 1657240 1304856 ? Sl 10:17 24:01 /usr/
lib/jvm/java-6-sun/jre/bin/java -Delasticsearch-service -Des-
foreground=yes -Des.path.home=/opt/elasticsearch -Djline.enabled=true -
XX:+AggressiveOpts -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError -Xms256m -
Xmx1024m -Djava.library.path=/opt/elasticsearch/bin/service/lib -
classpath /opt/elasticsearch/bin/service/lib/wrapper.jar:/opt/
elasticsearch/lib/elasticsearch-0.12.0.jar:/opt/elasticsearch/lib/
jline-0.9.94.jar:/opt/elasticsearch/lib/log4j-1.2.15.jar:/opt/
elasticsearch/lib/lucene-analyzers-3.0.2.jar:/opt/elasticsearch/lib/
lucene-core-3.0.2.jar:/opt/elasticsearch/lib/lucene-fast-vector-
highlighter-3.0.2.jar:/opt/elasticsearch/lib/lucene-queries-3.0.2.jar:/
opt/elasticsearch/lib/sigar/sigar-1.6.3.jar -
Dwrapper.key=T9MeKiQTKVe0hkUp -Dwrapper.port=32000 -
Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -
Dwrapper.disable_console_input=TRUE -Dwrapper.pid=5402 -
Dwrapper.version=3.4.0 -Dwrapper.native_library=wrapper -
Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1
org.tanukisoftware.wrapper.WrapperSimpleApp
org.elasticsearch.bootstrap.Bootstrap

The servers have 2 gigs of physical memory and we're giving
elasticsearch 1 gig so we don't expect memory to be an issue.


(Shay Banon) #2

Hey,

Can you give master a go? Many things changed there is now the cluster
state and indices creation is propagated through the cluster...

-shay.banon

On Thu, Nov 11, 2010 at 4:32 PM, ChrisM cmordue@gmail.com wrote:

We have a cluster of 4 nodes, each with 5 shards and 1 replica/shard.
We have two clients trying to index a lot of documents. One of the
clients had successfully indexed >100 documents but the second client
hung on the first document. From the logs, we can see that the Master
node had tried to replicate this to our fourth node. The fourth node
had the IndexMissingException and hung (we waited > 7 minutes). The
client that requested the indexing also hung and did not timeout
either. We've reproduced this case several times (starting with a
clean work folder).

A note about how we're indexing: we're using the documents publication
date as the index so our index will grow in the future (as recommended
in
http://elasticsearch-users.115913.n3.nabble.com/Changing-the-number-of-shards-and-replicas-of-an-existing-index-td413204.html
).

Is this an issue on the server side?

Config & log details below:

Client side code to connect to the cluster:
Map<String,String> m = new HashMap<String,String>();
m.put("cluster.name", "myCluster");
Settings s =
ImmutableSettings.settingsBuilder().put(m).build();

           // FIXME: get hosts from configuration
           // TODO: check if client API is thread safe
           client = new TransportClient(s)
          .addTransportAddress(new

InetSocketTransportAddress("10.177.163.72", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.65", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.57", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.164.95", 9300));

client side indexing code:
String index = new
SimpleDateFormat("yyyyMMdd").format(doc.getPublishedTime());
String type = lang.getName();
String id = doc.getFeedId();

                   client.prepareIndex(index, type, id)
                   .setOperationThreaded(false)
                   .setSource(jsonBuilder()
                               .startObject()
                               .field("id", doc.getFeedId())
                               .field("publishDate",

doc.getPublishedTime())
.field("content", doc.getContent())
.endObject()
)
.execute()
.actionGet();

elasticsearch.yml
cluster:
name: myCluster
network:
publish_host: 10.177.163.72
discovery.zen.ping.unicast:
hosts: ["10.177.163.72:9300", "10.177.162.65:9300",
"10.177.162.57:9300", "10.177.164.95:9300"]

Master (Server 1)
[2010-11-11 10:16:54,340][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initializing ...
[2010-11-11 10:16:54,350][INFO ][plugins ] [Xorr the
God-Jewel] loaded []
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initialized
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: starting ...
[2010-11-11 10:16:56,150][INFO ][transport ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.177.163.72:9300]}
[2010-11-11 10:16:59,410][INFO ][cluster.service ] [Xorr the
God-Jewel] new_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], reason: zen-disco-join
(elected_as_master)
[2010-11-11 10:16:59,420][INFO ][discovery ] [Xorr the
God-Jewel] SocialMetrix/QQho7EckQoOet726kVPG_Q
[2010-11-11 10:16:59,430][INFO ][http ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.177.163.72:9200]}
[2010-11-11 10:16:59,430][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: started
[2010-11-11 10:17:07,080][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Arlok][79lzFsNVRZWTJBOAm81PYw][inet[/
10.177.162.65:9300]],}, reason: zen-disco-receive(from node[[Arlok]
[79lzFsNVRZWTJBOAm81PYw][inet[/10.177.162.65:9300]]])
[2010-11-11 10:17:13,460][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Blaze, Siena][2reBSun3RZq-qCqOrnhQiw][inet[/
10.177.162.57:9300]],}, reason: zen-disco-receive(from node[[Blaze,
Siena][2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]]])
[2010-11-11 10:17:20,440][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Sefton, Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/
10.177.164.95:9300]],}, reason: zen-disco-receive(from node[[Sefton,
Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/10.177.164.95:9300]]])
[2010-11-11 10:20:27,380][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:28,410][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] created and added to cluster_state
[2010-11-11 10:20:28,900][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] update_mapping [en] (dynamic)
... successfully processed others
[2010-11-11 10:20:40,730][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:40,740][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081027] update_mapping [es] (dynamic)
[2010-11-11 10:20:40,890][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] created and added to cluster_state
[2010-11-11 10:20:40,960][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] update_mapping [es] (dynamic)
... continued processing others

Slave (Server 4)
[2010-11-11 10:17:15,410][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initializing ...
[2010-11-11 10:17:15,410][INFO ][plugins ] [Sefton,
Amanda] loaded []
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initialized
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: starting ...
[2010-11-11 10:17:17,190][INFO ][transport ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.177.164.95:9300]}
[2010-11-11 10:17:20,450][INFO ][cluster.service ] [Sefton,
Amanda] detected_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], added {[Arlok][79lzFsNVRZWTJBOAm81PYw]
[inet[/10.177.162.65:9300]],[Xorr the God-Jewel]
[QQho7EckQoOet726kVPG_Q][inet[/10.177.163.72:9300]],[Blaze, Siena]
[2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]],}, reason: zen-
disco-receive(from [[Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q][inet[/
10.177.163.72:9300]]])
[2010-11-11 10:17:20,450][INFO ][discovery ] [Sefton,
Amanda] SocialMetrix/luCj2bn8R6GVSAk0AJSYfQ
[2010-11-11 10:17:20,460][INFO ][http ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.177.164.95:9200]}
[2010-11-11 10:17:20,460][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: started
[2010-11-11 10:20:40,730][ERROR][transport.netty ] [Sefton,
Amanda] Failed to handle exception response
org.elasticsearch.indices.IndexMissingException: [20081016] missing
at
org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:
178)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
232)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
212)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:
93)
at org.elasticsearch.action.index.TransportIndexAction.access
$101(TransportIndexAction.java:62)
at org.elasticsearch.action.index.TransportIndexAction
$1.onFailure(TransportIndexAction.java:99)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction
$3.handleException(TransportMasterNodeOperationAction.java:175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$2.run(MessageChannelHandler.java:169)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Process info on Server 4 (Sometimes when the server gets in this
state, the CPU climbs and stays ~80%)
root 5404 35.8 61.9 1657240 1304856 ? Sl 10:17 24:01 /usr/
lib/jvm/java-6-sun/jre/bin/java -Delasticsearch-service -Des-
foreground=yes -Des.path.home=/opt/elasticsearch -Djline.enabled=true -
XX:+AggressiveOpts -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError -Xms256m -
Xmx1024m -Djava.library.path=/opt/elasticsearch/bin/service/lib -
classpath /opt/elasticsearch/bin/service/lib/wrapper.jar:/opt/
elasticsearch/lib/elasticsearch-0.12.0.jar:/opt/elasticsearch/lib/
jline-0.9.94.jar:/opt/elasticsearch/lib/log4j-1.2.15.jar:/opt/
elasticsearch/lib/lucene-analyzers-3.0.2.jar:/opt/elasticsearch/lib/
lucene-core-3.0.2.jar:/opt/elasticsearch/lib/lucene-fast-vector-
highlighter-3.0.2.jar:/opt/elasticsearch/lib/lucene-queries-3.0.2.jar:/
opt/elasticsearch/lib/sigar/sigar-1.6.3.jar -
Dwrapper.key=T9MeKiQTKVe0hkUp -Dwrapper.port=32000 -
Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -
Dwrapper.disable_console_input=TRUE -Dwrapper.pid=5402 -
Dwrapper.version=3.4.0 -Dwrapper.native_library=wrapper -
Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1
org.tanukisoftware.wrapper.WrapperSimpleApp
org.elasticsearch.bootstrap.Bootstrap

The servers have 2 gigs of physical memory and we're giving
elasticsearch 1 gig so we don't expect memory to be an issue.


(Shay Banon) #3

One more thing, why not use the node client? Should provide faster indexing,
and is much more lightweight in master.

-shay.banon

On Thu, Nov 11, 2010 at 4:59 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hey,

Can you give master a go? Many things changed there is now the cluster
state and indices creation is propagated through the cluster...

-shay.banon

On Thu, Nov 11, 2010 at 4:32 PM, ChrisM cmordue@gmail.com wrote:

We have a cluster of 4 nodes, each with 5 shards and 1 replica/shard.
We have two clients trying to index a lot of documents. One of the
clients had successfully indexed >100 documents but the second client
hung on the first document. From the logs, we can see that the Master
node had tried to replicate this to our fourth node. The fourth node
had the IndexMissingException and hung (we waited > 7 minutes). The
client that requested the indexing also hung and did not timeout
either. We've reproduced this case several times (starting with a
clean work folder).

A note about how we're indexing: we're using the documents publication
date as the index so our index will grow in the future (as recommended
in
http://elasticsearch-users.115913.n3.nabble.com/Changing-the-number-of-shards-and-replicas-of-an-existing-index-td413204.html
).

Is this an issue on the server side?

Config & log details below:

Client side code to connect to the cluster:
Map<String,String> m = new HashMap<String,String>();
m.put("cluster.name", "myCluster");
Settings s =
ImmutableSettings.settingsBuilder().put(m).build();

           // FIXME: get hosts from configuration
           // TODO: check if client API is thread safe
           client = new TransportClient(s)
          .addTransportAddress(new

InetSocketTransportAddress("10.177.163.72", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.65", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.162.57", 9300))
.addTransportAddress(new
InetSocketTransportAddress("10.177.164.95", 9300));

client side indexing code:
String index = new
SimpleDateFormat("yyyyMMdd").format(doc.getPublishedTime());
String type = lang.getName();
String id = doc.getFeedId();

                   client.prepareIndex(index, type, id)
                   .setOperationThreaded(false)
                   .setSource(jsonBuilder()
                               .startObject()
                               .field("id", doc.getFeedId())
                               .field("publishDate",

doc.getPublishedTime())
.field("content", doc.getContent())
.endObject()
)
.execute()
.actionGet();

elasticsearch.yml
cluster:
name: myCluster
network:
publish_host: 10.177.163.72
discovery.zen.ping.unicast:
hosts: ["10.177.163.72:9300", "10.177.162.65:9300",
"10.177.162.57:9300", "10.177.164.95:9300"]

Master (Server 1)
[2010-11-11 10:16:54,340][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initializing ...
[2010-11-11 10:16:54,350][INFO ][plugins ] [Xorr the
God-Jewel] loaded []
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: initialized
[2010-11-11 10:16:56,070][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: starting ...
[2010-11-11 10:16:56,150][INFO ][transport ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.177.163.72:9300]}
[2010-11-11 10:16:59,410][INFO ][cluster.service ] [Xorr the
God-Jewel] new_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], reason: zen-disco-join
(elected_as_master)
[2010-11-11 10:16:59,420][INFO ][discovery ] [Xorr the
God-Jewel] SocialMetrix/QQho7EckQoOet726kVPG_Q
[2010-11-11 10:16:59,430][INFO ][http ] [Xorr the
God-Jewel] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.177.163.72:9200]}
[2010-11-11 10:16:59,430][INFO ][node ] [Xorr the
God-Jewel] {elasticsearch/0.12.0}[10020]: started
[2010-11-11 10:17:07,080][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Arlok][79lzFsNVRZWTJBOAm81PYw][inet[/
10.177.162.65:9300]],}, reason: zen-disco-receive(from node[[Arlok]
[79lzFsNVRZWTJBOAm81PYw][inet[/10.177.162.65:9300]]])
[2010-11-11 10:17:13,460][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Blaze, Siena][2reBSun3RZq-qCqOrnhQiw][inet[/
10.177.162.57:9300]],}, reason: zen-disco-receive(from node[[Blaze,
Siena][2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]]])
[2010-11-11 10:17:20,440][INFO ][cluster.service ] [Xorr the
God-Jewel] added {[Sefton, Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/
10.177.164.95:9300]],}, reason: zen-disco-receive(from node[[Sefton,
Amanda][luCj2bn8R6GVSAk0AJSYfQ][inet[/10.177.164.95:9300]]])
[2010-11-11 10:20:27,380][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:28,410][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] created and added to cluster_state
[2010-11-11 10:20:28,900][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20101027] update_mapping [en] (dynamic)
... successfully processed others
[2010-11-11 10:20:40,730][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] creating index, cause [auto(index api)], shards
[5]/[1], mappings []
[2010-11-11 10:20:40,740][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081027] update_mapping [es] (dynamic)
[2010-11-11 10:20:40,890][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] created and added to cluster_state
[2010-11-11 10:20:40,960][INFO ][cluster.metadata ] [Xorr the
God-Jewel] [20081016] update_mapping [es] (dynamic)
... continued processing others

Slave (Server 4)
[2010-11-11 10:17:15,410][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initializing ...
[2010-11-11 10:17:15,410][INFO ][plugins ] [Sefton,
Amanda] loaded []
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: initialized
[2010-11-11 10:17:17,120][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: starting ...
[2010-11-11 10:17:17,190][INFO ][transport ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.177.164.95:9300]}
[2010-11-11 10:17:20,450][INFO ][cluster.service ] [Sefton,
Amanda] detected_master [Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q]
[inet[/10.177.163.72:9300]], added {[Arlok][79lzFsNVRZWTJBOAm81PYw]
[inet[/10.177.162.65:9300]],[Xorr the God-Jewel]
[QQho7EckQoOet726kVPG_Q][inet[/10.177.163.72:9300]],[Blaze, Siena]
[2reBSun3RZq-qCqOrnhQiw][inet[/10.177.162.57:9300]],}, reason: zen-
disco-receive(from [[Xorr the God-Jewel][QQho7EckQoOet726kVPG_Q][inet[/
10.177.163.72:9300]]])
[2010-11-11 10:17:20,450][INFO ][discovery ] [Sefton,
Amanda] SocialMetrix/luCj2bn8R6GVSAk0AJSYfQ
[2010-11-11 10:17:20,460][INFO ][http ] [Sefton,
Amanda] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.177.164.95:9200]}
[2010-11-11 10:17:20,460][INFO ][node ] [Sefton,
Amanda] {elasticsearch/0.12.0}[5404]: started
[2010-11-11 10:20:40,730][ERROR][transport.netty ] [Sefton,
Amanda] Failed to handle exception response
org.elasticsearch.indices.IndexMissingException: [20081016] missing
at
org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:
178)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
232)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:
212)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:
93)
at org.elasticsearch.action.index.TransportIndexAction.access
$101(TransportIndexAction.java:62)
at org.elasticsearch.action.index.TransportIndexAction
$1.onFailure(TransportIndexAction.java:99)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction
$3.handleException(TransportMasterNodeOperationAction.java:175)
at org.elasticsearch.transport.netty.MessageChannelHandler
$2.run(MessageChannelHandler.java:169)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Process info on Server 4 (Sometimes when the server gets in this
state, the CPU climbs and stays ~80%)
root 5404 35.8 61.9 1657240 1304856 ? Sl 10:17 24:01 /usr/
lib/jvm/java-6-sun/jre/bin/java -Delasticsearch-service -Des-
foreground=yes -Des.path.home=/opt/elasticsearch -Djline.enabled=true -
XX:+AggressiveOpts -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError -Xms256m -
Xmx1024m -Djava.library.path=/opt/elasticsearch/bin/service/lib -
classpath /opt/elasticsearch/bin/service/lib/wrapper.jar:/opt/
elasticsearch/lib/elasticsearch-0.12.0.jar:/opt/elasticsearch/lib/
jline-0.9.94.jar:/opt/elasticsearch/lib/log4j-1.2.15.jar:/opt/
elasticsearch/lib/lucene-analyzers-3.0.2.jar:/opt/elasticsearch/lib/
lucene-core-3.0.2.jar:/opt/elasticsearch/lib/lucene-fast-vector-
highlighter-3.0.2.jar:/opt/elasticsearch/lib/lucene-queries-3.0.2.jar:/
opt/elasticsearch/lib/sigar/sigar-1.6.3.jar -
Dwrapper.key=T9MeKiQTKVe0hkUp -Dwrapper.port=32000 -
Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -
Dwrapper.disable_console_input=TRUE -Dwrapper.pid=5402 -
Dwrapper.version=3.4.0 -Dwrapper.native_library=wrapper -
Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1
org.tanukisoftware.wrapper.WrapperSimpleApp
org.elasticsearch.bootstrap.Bootstrap

The servers have 2 gigs of physical memory and we're giving
elasticsearch 1 gig so we don't expect memory to be an issue.


(ChrisM) #4

We just tried the latest snapshot (master from today) and the client side code ran out of memory on startup :frowning:

10/11/11 16:59:39 WARN transport.netty: [Synch] Exception caught on netty layer [[id: 0x56a6cbf7, /10.177.163.72:44746 => /10.177.163.72:9300]]
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.io.stream.StreamInput.readUTF(StreamInput.java:113)
at org.elasticsearch.common.io.stream.HandlesStreamInput.readUTF(HandlesStreamInput.java:49)
at org.elasticsearch.common.transport.InetSocketTransportAddress.readFrom(InetSocketTransportAddress.java:69)
at org.elasticsearch.common.transport.TransportAddressSerializers.addressFromStream(TransportAddressSerializers.java:79)
at org.elasticsearch.cluster.node.DiscoveryNode.readFrom(DiscoveryNode.java:208)
at org.elasticsearch.cluster.node.DiscoveryNode.readNode(DiscoveryNode.java:201)
at org.elasticsearch.action.support.nodes.NodeOperationResponse.readFrom(NodeOperationResponse.java:60)
at org.elasticsearch.action.admin.cluster.node.info.NodeInfo.readFrom(NodeInfo.java:192)
at org.elasticsearch.action.admin.cluster.node.info.NodeInfo.readNodeInfo(NodeInfo.java:187)
at org.elasticsearch.action.admin.cluster.node.info.NodesInfoResponse.readFrom(NodesInfoResponse.java:45)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:124)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:102)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:214)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


(ChrisM) #5

Ok, my OutOfMemoryError was due to a classpath issue and was not an ElasticSearch issue. Now we're testing with the Master to try to reproduce the initial issue.


(ChrisM) #6

With the snapshot from today, we experienced the same problem: the slave ES server got IndexMissingException and hung. The client that sent the data to index also hung. Neither timed out. Below is the log entries from the slave ES server:

[2010-11-11 17:42:56,288][INFO ][node ] [Iron Man] {elasticsearch/0.13.0-SNAPSHOT/2010-11-11T17:10:40}[10223]: initializing ...
[2010-11-11 17:42:56,288][INFO ][plugins ] [Iron Man] loaded []
[2010-11-11 17:42:57,958][INFO ][node ] [Iron Man] {elasticsearch/0.13.0-SNAPSHOT/2010-11-11T17:10:40}[10223]: initialized
[2010-11-11 17:42:57,958][INFO ][node ] [Iron Man] {elasticsearch/0.13.0-SNAPSHOT/2010-11-11T17:10:40}[10223]: starting ...
[2010-11-11 17:42:58,168][INFO ][transport ] [Iron Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.177.162.57:9300]}
[2010-11-11 17:43:01,298][INFO ][cluster.service ] [Iron Man] detected_master [Kurse][kIOjlb3XQfats2lh7ypZrg][inet[/10.177.163.72:9300]], added {[Kurse][kIOjlb3XQfats2lh7ypZrg][inet[/10.177.163.72:9300]],[Fury, Jaco
b "Jake"][N_zicwr_RwWGDMB2vxMCgQ][inet[/10.177.162.65:9300]],}, reason: zen-disco-receive(from [[Kurse][kIOjlb3XQfats2lh7ypZrg][inet[/10.177.163.72:9300]]])
[2010-11-11 17:43:01,298][INFO ][discovery ] [Iron Man] SocialMetrix/S_i15vHFTW24pBSTWVCcnA
[2010-11-11 17:43:01,298][INFO ][http ] [Iron Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.177.162.57:9200]}
[2010-11-11 17:43:01,298][INFO ][node ] [Iron Man] {elasticsearch/0.13.0-SNAPSHOT/2010-11-11T17:10:40}[10223]: started
[2010-11-11 17:43:22,168][INFO ][cluster.service ] [Iron Man] added {[Slug][g6Q1cf8JRAyjOkrJGhlqpw][inet[/10.177.164.95:9300]],}, reason: zen-disco-receive(from [[Kurse][kIOjlb3XQfats2lh7ypZrg][inet[/10.177.163.72:9
300]]])
[2010-11-11 17:44:46,328][ERROR][transport.netty ] [Iron Man] Failed to handle exception response
org.elasticsearch.indices.IndexMissingException: [20090723] missing
at org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:178)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:238)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.(TransportShardReplicationOperationAction.java:218)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:97)
at org.elasticsearch.action.index.TransportIndexAction.access$101(TransportIndexAction.java:62)
at org.elasticsearch.action.index.TransportIndexAction$1.onFailure(TransportIndexAction.java:99)
at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$3.handleException(TransportMasterNodeOperationAction.java:175)
at org.elasticsearch.transport.netty.MessageChannelHandler$2.run(MessageChannelHandler.java:169)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


(ChrisM) #7

We had initially tried to use a node-based client but had an OutOfMemoryError when creating the node:
Node node = nodeBuilder().client(true).node();
Client client = node.client();


(ChrisM) #8

We've made some changes and think we've determined the issue. We were creating an index per day but were processing documents from many different days (from over several years) in a random order so almost every index request was causing the server to create a new index. It appears that the time to create a new index is very large in comparison to index a document into an existing index. Also, we found the gc was always running and could never free much memory (e.g. freeing 1MB and using 1Gig).

If we create many indexes and are using the default of 5000 for index.translog.flush_threshold, would that cause the server to try to keep everything in memory, thus causing it to slow down considerably?

And our hypothesis is that once it slows down, replicating and/or moving the shard fails silently and then we start getting IndexMissingException.


(Shay Banon) #9

If you end up creating many indices (each with 5 shards, the default, since
you let the index operation create them), you will end up with an
overcapacity on the machine and get into either GC thrashing or eventually
OOM.

You will need to either increate the number of machines you have, or rethink
your index creation strategy (i.e. instead of creating an index per user,
have a single index with the user as the field in each doc).

-shay.banon

On Thu, Nov 11, 2010 at 11:02 PM, ChrisM cmordue@gmail.com wrote:

We had initially tried to use a node-based client but had an
OutOfMemoryError
when creating the node:
Node node = nodeBuilder().client(true).node();
Client client = node.client();

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/IndexMissingException-on-Client-prepareIndex-call-tp1882808p1885187.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #10