How to wait for a CreateIndexRequest to really finish (using java TransportClient)

Hi there,

I'm using the native client (TransportClient) and I'm running the following
code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name",
"elasticsearch").build(); TransportClient client = new
TransportClient(settings); client.addTransportAddress(new
InetSocketTransportAddress("localhost", 9300)); Settings indexSettings =
ImmutableSettings.settingsBuilder().put("number_of_shards", 4)
.put("number_of_replicas", 3).build(); CreateIndexRequestBuilder
createIndexBuilder = client.admin().indices().prepareCreate("newindex");
createIndexBuilder.setSettings(indexSettings); CreateIndexResponse
createIndexResponse = createIndexBuilder.execute().actionGet();
System.out.printf("acknowledged: %s%n",
createIndexResponse.acknowledged()); GetRequestBuilder get =
client.prepareGet(); get.setIndex("newindex"); get.setType("mytype");
get.setId("myid"); try { GetResponse response =
get.execute().actionGet(1000); System.out.printf("exists: %s%n",
response.exists()); } catch (Throwable ex) { ex.printStackTrace(); }
When I execute it for the first time on a fresh node, I always get the
following error:
acknowledged: true
org.elasticsearch.action.NoShardAvailableActionException: [newindex][0] No
shard available for [[newindex][mytype][myid]: routing [null]]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:139)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:124)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:71)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:46)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:61)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:217)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:199)
...

When I delete the index and run this code again, it runs just fine (and
prints exists:false obviously).
When I execute the GetRequest a little bit later, it runs just fine too.

How do I wait for the CreateIndexRequest to be really completed before
starting other processing on the new index?

Jaap

--

I've tested it a little bit more, sometimes a new indexname, runs ok, so
some chance is involved, but I seem to get it quite often.

--

Just add a waitForYellow after index creation.

Something like this:
https://github.com/elasticsearchfr/hands-on/blob/answers/src/test/java/org/elasticsearchfr/handson/StartNode.java#L32
https://github.com/elasticsearchfr/hands-on/blob/answers/src/test/java/org/elasticsearchfr/handson/StartNode.java#L32

It should help.

David.

Le 8 janvier 2013 à 12:00, Jaap Taal jaap@q42.nl a écrit :

Hi there,

I'm using the native client (TransportClient) and I'm running the following
code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name",
"elasticsearch").build(); TransportClient client = new
TransportClient(settings); client.addTransportAddress(new
InetSocketTransportAddress("localhost", 9300)); Settings indexSettings =
ImmutableSettings.settingsBuilder().put("number_of_shards", 4)
.put("number_of_replicas", 3).build(); CreateIndexRequestBuilder
createIndexBuilder = client.admin().indices().prepareCreate("newindex");
createIndexBuilder.setSettings(indexSettings); CreateIndexResponse
createIndexResponse = createIndexBuilder.execute().actionGet();
System.out.printf("acknowledged: %s%n", createIndexResponse.acknowledged());
GetRequestBuilder get = client.prepareGet(); get.setIndex("newindex");
get.setType("mytype"); get.setId("myid"); try { GetResponse response =
get.execute().actionGet(1000); System.out.printf("exists: %s%n",
response.exists()); } catch (Throwable ex) { ex.printStackTrace(); }
When I execute it for the first time on a fresh node, I always get the
following error:
acknowledged: true
org.elasticsearch.action.NoShardAvailableActionException: [newindex][0] No
shard available for [[newindex][mytype][myid]: routing [null]]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:139)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:124)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:71)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:46)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:61)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:217)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:199)
...

When I delete the index and run this code again, it runs just fine (and
prints exists:false obviously).
When I execute the GetRequest a little bit later, it runs just fine too.

How do I wait for the CreateIndexRequest to be really completed before
starting other processing on the new index?

Jaap

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Nice, that really helped!

FYI, I opened the following github issue to ask for some more automation
for this case:

this is what I used in the end:
ClusterHealthRequestBuilder healthRequest =
client.admin().cluster().prepareHealth();
healthRequest.setIndices(newIndexName); // only request health of this
index...

healthRequest.setWaitForYellowStatus();
ClusterHealthResponse healthResponse = healthRequest.execute().actionGet();
System.out.printf("status: %s%n", healthResponse.status());

To avoid having parallel index processes on the same cluster causing the
health check for my case to wait longer than needed.
Is that going to do what I expect of it?

Thanks!

On Tuesday, January 8, 2013 12:13:47 PM UTC+1, David Pilato wrote:

Just add a waitForYellow after index creation.

Something like this:
https://github.com/elasticsearchfr/hands-on/blob/answers/src/test/java/org/elasticsearchfr/handson/StartNode.java#L32

It should help.

David.

Le 8 janvier 2013 à 12:00, Jaap Taal <ja...@q42.nl <javascript:>> a
écrit :

Hi there,

I'm using the native client (TransportClient) and I'm running the
following code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name",
"elasticsearch").build(); TransportClient client = new
TransportClient(settings); client.addTransportAddress(new
InetSocketTransportAddress("localhost", 9300)); Settings indexSettings =
ImmutableSettings.settingsBuilder().put("number_of_shards", 4)
.put("number_of_replicas", 3).build(); CreateIndexRequestBuilder
createIndexBuilder = client.admin().indices().prepareCreate("newindex");
createIndexBuilder.setSettings(indexSettings); CreateIndexResponse
createIndexResponse = createIndexBuilder.execute().actionGet();
System.out.printf("acknowledged: %s%n",
createIndexResponse.acknowledged()); GetRequestBuilder get =
client.prepareGet(); get.setIndex("newindex"); get.setType("mytype");
get.setId("myid"); try { GetResponse response =
get.execute().actionGet(1000); System.out.printf("exists: %s%n",
response.exists()); } catch (Throwable ex) { ex.printStackTrace(); }
When I execute it for the first time on a fresh node, I always get the
following error:
acknowledged: true
org.elasticsearch.action.NoShardAvailableActionException: [newindex][0]
No shard available for [[newindex][mytype][myid]: routing [null]]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:139)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:124)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:71)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:46)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:61)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:217)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:199)
...

When I delete the index and run this code again, it runs just fine (and
prints exists:false obviously).
When I execute the GetRequest a little bit later, it runs just fine too.

How do I wait for the CreateIndexRequest to be really completed before
starting other processing on the new index?

Jaap

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

I don't fully understand your issue.

  1. you create an index
  2. you issue immediately a get request on it, despite having inserted a
    single document.

This will never work out right. Why don't you insert documents first? What
is your intention?

Creating an index is an asynchronous node operation executed by the master
of the cluster. The cluster master node receives the creation request,
submits the index creation to the participating nodes, and returns a
response to the client. Meanwhile, depending on how busy the other nodes
are, they create the shards, and all shards register in the cluster state
for routing. When enough shards have registered, the index state changes.
This operation is encapsulated by an index-level block. Unfortunately,
Elasticsearch does not exactly know in advance when the first shard will
appear in the routing, so, because the create index call must not hang, it
returns early.

Adding a globally blocked call to the index creation API would introduce
other problems. Probably, index creations could hang a client more easily
then.

Waiting for state yellow/green is similar for waiting for the end of the
index being blocked by the master.

There is another method in the cluster health request
"waitForActiveShards()" which can wait for a certain number of shards in
that index. In most cases, waiting for one shard is enough to continue
without error.

Note, it is a common task to check the cluster health first before indexing
(either for yellow or green). ES does not hinder you from indexing (or
getting) docs, even if the cluster is red.

Jörg

--

Whoops,

the last question wasn't really clear.

About the health call, setIndices

On Tuesday, January 8, 2013 4:33:50 PM UTC+1, Jaap Taal wrote:

Nice, that really helped!

FYI, I opened the following github issue to ask for some more automation
for this case:
An acknowledged CreateIndexRequest followed by GetRequest sometimes fails · Issue #2527 · elastic/elasticsearch · GitHub

this is what I used in the end:
ClusterHealthRequestBuilder healthRequest =
client.admin().cluster().prepareHealth();
healthRequest.setIndices(newIndexName); // only request health of this
index...

healthRequest.setWaitForYellowStatus();
ClusterHealthResponse healthResponse = healthRequest.execute().actionGet();
System.out.printf("status: %s%n", healthResponse.status());

To avoid having parallel index processes on the same cluster causing the
health check for my case to wait longer than needed.
Is that going to do what I expect of it?

Thanks!

On Tuesday, January 8, 2013 12:13:47 PM UTC+1, David Pilato wrote:

Just add a waitForYellow after index creation.

Something like this:
https://github.com/elasticsearchfr/hands-on/blob/answers/src/test/java/org/elasticsearchfr/handson/StartNode.java#L32

It should help.

David.

Le 8 janvier 2013 à 12:00, Jaap Taal ja...@q42.nl a écrit :

Hi there,

I'm using the native client (TransportClient) and I'm running the
following code:

Settings settings = ImmutableSettings.settingsBuilder().put("
cluster.name", "elasticsearch").build(); TransportClient client = new
TransportClient(settings); client.addTransportAddress(new
InetSocketTransportAddress("localhost", 9300)); Settings indexSettings =
ImmutableSettings.settingsBuilder().put("number_of_shards", 4)
.put("number_of_replicas", 3).build(); CreateIndexRequestBuilder
createIndexBuilder = client.admin().indices().prepareCreate("newindex");
createIndexBuilder.setSettings(indexSettings); CreateIndexResponse
createIndexResponse = createIndexBuilder.execute().actionGet();
System.out.printf("acknowledged: %s%n",
createIndexResponse.acknowledged()); GetRequestBuilder get =
client.prepareGet(); get.setIndex("newindex"); get.setType("mytype");
get.setId("myid"); try { GetResponse response =
get.execute().actionGet(1000); System.out.printf("exists: %s%n",
response.exists()); } catch (Throwable ex) { ex.printStackTrace(); }
When I execute it for the first time on a fresh node, I always get the
following error:
acknowledged: true
org.elasticsearch.action.NoShardAvailableActionException: [newindex][0]
No shard available for [[newindex][mytype][myid]: routing [null]]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:139)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:124)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:71)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:46)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:61)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:217)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$TransportHandler.messageReceived(TransportShardSingleOperationAction.java:199)
...

When I delete the index and run this code again, it runs just fine (and
prints exists:false obviously).
When I execute the GetRequest a little bit later, it runs just fine too.

How do I wait for the CreateIndexRequest to be really completed before
starting other processing on the new index?

Jaap

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

It's a narrowed down example. When you're a building complex
indexing-application the situation could arise that you've just created an
index in one function and try to check for the existence of a document by
id in another.

Now the execution flow of the two functions in my case are on the same
thread fortunately, so I'd like to make sure that after my index gets
created, the other code doesn't fail with the exception that shard number 0
is not ready. The point is that the get by id operation directly routes the
key to a shard that is not ready yet.
When you're searching this also happens:
acknowledged: true
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to
execute phase [query], total failure; shardFailures
{[na][prices-2p-d20130118-i20130107][0]: No active
shards}{[na][prices-2p-d20130118-i20130107][1]: No active
shards}{[na][prices-2p-d20130118-i20130107][2]: No active
shards}{[na][prices-2p-d20130118-i20130107][3]: No active shards}
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:260)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:145)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:58)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:48)

What I proposed was an optional flag just like "refresh=true", but then for
the CreateIndexRequest. If not desirable, this quirk should be documented
and an example how to work around it could be suggested by
elasticsearch.org.

Checking the health of a cluster before indexing each document is not
feasible, it would be too slow.

On Wednesday, January 9, 2013 2:04:42 AM UTC+1, Jörg Prante wrote:

I don't fully understand your issue.

  1. you create an index
  2. you issue immediately a get request on it, despite having inserted a
    single document.

This will never work out right. Why don't you insert documents first? What
is your intention?

Creating an index is an asynchronous node operation executed by the master
of the cluster. The cluster master node receives the creation request,
submits the index creation to the participating nodes, and returns a
response to the client. Meanwhile, depending on how busy the other nodes
are, they create the shards, and all shards register in the cluster state
for routing. When enough shards have registered, the index state changes.
This operation is encapsulated by an index-level block. Unfortunately,
Elasticsearch does not exactly know in advance when the first shard will
appear in the routing, so, because the create index call must not hang, it
returns early.

Adding a globally blocked call to the index creation API would introduce
other problems. Probably, index creations could hang a client more easily
then.

Waiting for state yellow/green is similar for waiting for the end of the
index being blocked by the master.

There is another method in the cluster health request
"waitForActiveShards()" which can wait for a certain number of shards in
that index. In most cases, waiting for one shard is enough to continue
without error.

Note, it is a common task to check the cluster health first before
indexing (either for yellow or green). ES does not hinder you from indexing
(or getting) docs, even if the cluster is red.

Jörg

--

It's not a quirk. It's the distributed architecture of Elasticsearch. There
is no easy method to implement a synchronous index creation, as I tried to
clarify. It would introduce more issues than it may solve.

Behind the scene, your proposal would be implemented by a sequence of two
commands, first index creation, then a cluster state wait, then a response
to the client. It's just "sugar coating", no new function, because clients
can use this two command sequence already. Note, a health check returns
immediately, or waits 200ms internally in a loop until the requested
condition is satisfied. I can hardly imagine of a faster method in a
distributed system, where the cluster state propagates through a number of
connected nodes.

A parameter name proposal of "refresh=true" is rather unfortunate. It
shouldn't be confused with an index refresh since there are no documents to
refresh.

Some suggestions:

  • use index aliases for shard presence guarantee. If you can create a
    variable number of indices to search in parallel, adding a new index to
    existing indices by an index alias will never throw a message "no active
    shards".

  • use front-end index control, for example by a reverse proxy. If
    malevolent clients ask for documents which might not be there, catch the
    index creation error message, and wait in the front-end for cluster health,
    and execute the search afterwards.

Jörg

On Wednesday, January 9, 2013 10:56:52 AM UTC+1, Jaap Taal wrote:

What I proposed was an optional flag just like "refresh=true", but then
for the CreateIndexRequest. If not desirable, this quirk should be
documented and an example how to work around it could be suggested by
elasticsearch.org.

Checking the health of a cluster before indexing each document is not
feasible, it would be too slow.

--

I love the first suggestion. Smart.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 janv. 2013 à 19:55, Jörg Prante joergprante@gmail.com a écrit :

It's not a quirk. It's the distributed architecture of Elasticsearch. There is no easy method to implement a synchronous index creation, as I tried to clarify. It would introduce more issues than it may solve.

Behind the scene, your proposal would be implemented by a sequence of two commands, first index creation, then a cluster state wait, then a response to the client. It's just "sugar coating", no new function, because clients can use this two command sequence already. Note, a health check returns immediately, or waits 200ms internally in a loop until the requested condition is satisfied. I can hardly imagine of a faster method in a distributed system, where the cluster state propagates through a number of connected nodes.

A parameter name proposal of "refresh=true" is rather unfortunate. It shouldn't be confused with an index refresh since there are no documents to refresh.

Some suggestions:

  • use index aliases for shard presence guarantee. If you can create a variable number of indices to search in parallel, adding a new index to existing indices by an index alias will never throw a message "no active shards".

  • use front-end index control, for example by a reverse proxy. If malevolent clients ask for documents which might not be there, catch the index creation error message, and wait in the front-end for cluster health, and execute the search afterwards.

Jörg

On Wednesday, January 9, 2013 10:56:52 AM UTC+1, Jaap Taal wrote:

What I proposed was an optional flag just like "refresh=true", but then for the CreateIndexRequest. If not desirable, this quirk should be documented and an example how to work around it could be suggested by elasticsearch.org.

Checking the health of a cluster before indexing each document is not feasible, it would be too slow.

--

--

Understood, will not nag again :wink:

Thanks for you explanation! Using an alias is definitely what I want, and
since a newly created index is always empty it would allow me to do this
without waiting for a status. Though it's always good to know how and why
you should wait for active shards or the yellow status (I'd think the
active shards are a better method for a newly created index).

Jaap Taal

[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl |
Waldorpstraat 17F, Den Haag | KvK 30164662 ]

On Wed, Jan 9, 2013 at 7:55 PM, Jörg Prante joergprante@gmail.com wrote:

It's not a quirk. It's the distributed architecture of Elasticsearch.
There is no easy method to implement a synchronous index creation, as I
tried to clarify. It would introduce more issues than it may solve.

Behind the scene, your proposal would be implemented by a sequence of two
commands, first index creation, then a cluster state wait, then a response
to the client. It's just "sugar coating", no new function, because clients
can use this two command sequence already. Note, a health check returns
immediately, or waits 200ms internally in a loop until the requested
condition is satisfied. I can hardly imagine of a faster method in a
distributed system, where the cluster state propagates through a number of
connected nodes.

A parameter name proposal of "refresh=true" is rather unfortunate. It
shouldn't be confused with an index refresh since there are no documents to
refresh.

Some suggestions:

  • use index aliases for shard presence guarantee. If you can create a
    variable number of indices to search in parallel, adding a new index to
    existing indices by an index alias will never throw a message "no active
    shards".

  • use front-end index control, for example by a reverse proxy. If
    malevolent clients ask for documents which might not be there, catch the
    index creation error message, and wait in the front-end for cluster health,
    and execute the search afterwards.

Jörg

On Wednesday, January 9, 2013 10:56:52 AM UTC+1, Jaap Taal wrote:

What I proposed was an optional flag just like "refresh=true", but then
for the CreateIndexRequest. If not desirable, this quirk should be
documented and an example how to work around it could be suggested by
elasticsearch.org.

Checking the health of a cluster before indexing each document is not
feasible, it would be too slow.

--

--