I’m looking for clarity on the intended behavior of the
setOperationThreaded(…) and listenerThreaded(…) methods in the Java APIs
that interact with an embedded cluster. I’ve pieced together some
assumptions based on the snippets of documentation, posts, and test cases
I’ve run. However, assumptions are dangerous when building a production
system so I figured it’s best to ask the experts:
What is the difference between setOperationThreaded and
listenerThreaded (and setListenerThreaded)? Some requests only support one
or the other while other requests support both.
I’ve run into cases where modifications to an index or a document
don’t take effect immediately. For example, if I delete an index (or
mapping) then immediately add a new index, sometimes the new index doesn’t
take effect:
// Out with the old…
final DeleteIndexResponse response = _client.admin().indices().delete(
new DeleteIndexRequest(indices)).actionGet();*
// In with the new…
final PutMappingResponse putResponse =
_client.admin().indices().preparePutMapping(index)
I believe this is because the operation is threaded by default and it’s
running into a race condition. I’ve verified this by adding a
Thread.sleep(1000) call between operations:
// Out with the old…
final DeleteIndexResponse response = _client.admin().indices().delete(
new DeleteIndexRequest(indices)).actionGet();*
// Nap time.
Thread.sleep(1000);
// In with the new…
final PutMappingResponse putResponse =
_client.admin().indices().preparePutMapping(index)
.setType(type).setSource(xmapping)
.execute().actionGet();*
Ideally I would like some operations to behave synchronously so one call
can assume the other completed. Is this where I would use
setOperationThreaded or listenerThreaded? Perhaps something similar to
below would guarantee the index is deleted before the new one is created?:
// Out with the old…
final DeleteIndexResponse response = _client.admin().indices().delete(
new DeleteIndexRequest(indices).listenerThreaded(false)
).actionGet();*
// In with the new…
final PutMappingResponse putResponse =
_client.admin().indices().preparePutMapping(index)
I do not fully understand why an index must be deleted just immediately
before it is created again. If you want to switch atomically from one index
to another, index aliases are a more proper way to go (just create a new
index without deleting the old one, switch over to the new with the help of
an index alias, then drop old index).
All API calls on Elasticsearch are executed asynchronously. This means
scalable execution, but it also means that it is not possible to enforce
waiting for all nodes to complete. There are various reasons for that (e.g.
it would be fatal if just one busy or erratic node could block all requests
on a cluster).
When you request a node, operations can execute on other nodes. Admin
operations are executed on the master. The master decides how to use
cluster locks or index locks.
With setListenerThreaded(true) you can wait in a client for a response from
the cluster node in the same thread. This does not mean the execution must
be synchronous with all affected nodes in the cluster.
With setOperationThreaded(true), you can serialize an index-based operation
for an index over the index shards. This may save thread resources for
index based operations on a node, but may take longer.
There is also masterNodeTimeout(timeout) method to control how long the
master node should wait for nodes, for master node operations like index
deletions.
If you want to wait for a delete index operation, maybe issuing a refresh
action will block until all indices are up to date in the cluster state.
Thank you for the reply. This is helpful for me to understand the
difference between operation and listener threading. Based on your advice,
I'll always assume asynchronous and leverage an index alias to achieve the
same result.
Thanks again!
On Saturday, October 19, 2013 3:17:45 PM UTC-7, Jörg Prante wrote:
I do not fully understand why an index must be deleted just immediately
before it is created again. If you want to switch atomically from one index
to another, index aliases are a more proper way to go (just create a new
index without deleting the old one, switch over to the new with the help of
an index alias, then drop old index).
All API calls on Elasticsearch are executed asynchronously. This means
scalable execution, but it also means that it is not possible to enforce
waiting for all nodes to complete. There are various reasons for that (e.g.
it would be fatal if just one busy or erratic node could block all requests
on a cluster).
When you request a node, operations can execute on other nodes. Admin
operations are executed on the master. The master decides how to use
cluster locks or index locks.
With setListenerThreaded(true) you can wait in a client for a response
from the cluster node in the same thread. This does not mean the execution
must be synchronous with all affected nodes in the cluster.
With setOperationThreaded(true), you can serialize an index-based
operation for an index over the index shards. This may save thread
resources for index based operations on a node, but may take longer.
There is also masterNodeTimeout(timeout) method to control how long the
master node should wait for nodes, for master node operations like index
deletions.
If you want to wait for a delete index operation, maybe issuing a refresh
action will block until all indices are up to date in the cluster state.
Hi Alex,
I have a a few things to add.
We've been doing some work to make elasticsearch more testable (avoiding
Thread.sleep hopefully). Part of this is around acknowledgements after a
cluster state update, which can be a settings update, an index creation or
deletion, a put mapping operation, an alias creation and so on. As Jorg
said, those operations are redirected to the master node, which is the only
one that's allowed to modify the cluster state. After the change has been
applied on the master, the updated cluster state is pushed to all the other
nodes. In most of the apis (we are working on having a consistent behaviour
here) you can actually control how long you wait for the operation to be
executed on all nodes. The parameter is called "timeout", default 10
seconds. The json response will contain an acknowledged parameter that
tells whether the request was acknowledged by all nodes or not. If that's
"true", that means that all the nodes are aware of the change that happened
in the cluster state and hold the updated one.
Something that you could for instance do in your code after you delete an
index is checking the response you get and see whether it was acknowledged
or not.
Anyways, would be nice to know what you were trying to achieve with your
code. I see you delete a set of indices, then you put a mapping for another
one. Is this second index part of the set that you delete in the first
operation? Also, which version of elasticsearch are you using?
Cheers
Luca
On Sunday, October 20, 2013 9:05:02 PM UTC+2, Alex Clark wrote:
Thank you for the reply. This is helpful for me to understand the
difference between operation and listener threading. Based on your advice,
I'll always assume asynchronous and leverage an index alias to achieve the
same result.
Thanks again!
On Saturday, October 19, 2013 3:17:45 PM UTC-7, Jörg Prante wrote:
I do not fully understand why an index must be deleted just immediately
before it is created again. If you want to switch atomically from one index
to another, index aliases are a more proper way to go (just create a new
index without deleting the old one, switch over to the new with the help of
an index alias, then drop old index).
All API calls on Elasticsearch are executed asynchronously. This means
scalable execution, but it also means that it is not possible to enforce
waiting for all nodes to complete. There are various reasons for that (e.g.
it would be fatal if just one busy or erratic node could block all requests
on a cluster).
When you request a node, operations can execute on other nodes. Admin
operations are executed on the master. The master decides how to use
cluster locks or index locks.
With setListenerThreaded(true) you can wait in a client for a response
from the cluster node in the same thread. This does not mean the execution
must be synchronous with all affected nodes in the cluster.
With setOperationThreaded(true), you can serialize an index-based
operation for an index over the index shards. This may save thread
resources for index based operations on a node, but may take longer.
There is also masterNodeTimeout(timeout) method to control how long the
master node should wait for nodes, for master node operations like index
deletions.
If you want to wait for a delete index operation, maybe issuing a refresh
action will block until all indices are up to date in the cluster state.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.