I had an index that had 5 primary shards and around 250GB of data in total (primaries only). I decided to reindex it with more primary shards (30) to see if I would have performance improvements. The index has 3 more replicas that go to the other nodes.
After reindexing the data I noticed something weird in the search profiler in kibana. The queries that I send are being only processed by one node at a time.
From my understanding isn't elasticserach suppose to process the query in parallel in different nodes if those nodes contain the shards of the index or is it just my incorrect understanding?
I've made sure that the shards are allocated into each of the nodes so that all of them can process the query (be that a primary or replica shard).
My current setup includes 4 nodes each being data and master nodes with 8CPU and 30GB of RAM that are running elasticserach 7.13 running with ECK (kubernetes).
Is there an index setting that I should enable or maybe it's something else I should do to allow all the nodes to participate in the query processing?
The node that receives the query should be selecting a shard - either primary or replica - at random.
It should be sending requests to other shards on other nodes, so my questions would be;
I switched to a 3 node cluster with 16CPU and 60RAM where all the nodes are master and data nodes just so see if it is having any effects.
To my surprise it looks like now the queries are only being handled by only one node at a time regardless of the index I am choosing.
I don't really know what to search in regards to this, or which parts of the documentation to refer to weather that be some sort of cluster setting or index setting that may be effecting this.
Could the fact that the cluster does not have any dedicated master nodes have anything to do with this? Or maybe it needs dedicated client nodes to distribute the search between shards on different nodes?
In any case i will try and experiments with things like this to see if maybe that is the problem.
Recent versions of Elasticsearch has adaptive replica selection enabled by default. It is possible this a low query volumes favours local shards but I do not know the internals. You can control this by d as pacifying a preference if you want to and see if this makes a difference.
Optimising for processing a single query at a time may not being optimal if you in real life expect a higher number of concurrent queries.
I tried disabling the adaptive replica selection and using the preference parameter to set the nodes.
That worked, however if i do not set a preference where i put all the nodes in the _only_nodes value the same thing is happening.
Is this the intended behavior of the adaptive replica selection though? I would assume that the option was enabled for better performance and not let the user select which nodes to be selected (I can understand the times where that would be useful though)
Whichever setting i put in the adaptive replica selection (true or false) the query only runs on a single node unless a preference value with _only_nodes:<node1>,<node2>,.. is specified.
It's now happening for every index that i run the query.
This is indeed surprising behavior! If you haven't figured this out already, would you be able to paste the following information? It would help us investigate and determine whether there's a bug:
Elasticsearch version
The output of GET /_nodes/stats/_all, shortly before you run your test. This will include adaptive replica selection stats.
The output of GET /_cat/shards, just in case the shards aren't assigned as we expect.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.