Why is only one node processing the query?

Hi everyone.

I had an index that had 5 primary shards and around 250GB of data in total (primaries only). I decided to reindex it with more primary shards (30) to see if I would have performance improvements. The index has 3 more replicas that go to the other nodes.

After reindexing the data I noticed something weird in the search profiler in kibana. The queries that I send are being only processed by one node at a time.

While the index with 5 primary shards does actually process the requests in different nodes

From my understanding isn't elasticserach suppose to process the query in parallel in different nodes if those nodes contain the shards of the index or is it just my incorrect understanding?

I've made sure that the shards are allocated into each of the nodes so that all of them can process the query (be that a primary or replica shard).

My current setup includes 4 nodes each being data and master nodes with 8CPU and 30GB of RAM that are running elasticserach 7.13 running with ECK (kubernetes).

Is there an index setting that I should enable or maybe it's something else I should do to allow all the nodes to participate in the query processing?

The node that receives the query should be selecting a shard - either primary or replica - at random.
It should be sending requests to other shards on other nodes, so my questions would be;

  • does this happen every query made?
  • what is the query?

This is happening only when i query this index and not the other indices that i query.

The field that I'm searching (content) is a text field.

The query is this (it's the same query used in the index that had 5 primaries).

{
  "size": 2, 
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": {
              "query": " {text to search}"
              , "minimum_should_match": "60%"
            }
          }
        },
        {
          "match_phrase": {
            "content": {
              "query": "{text to search}"
              , "slop": 100
            }
          }
        }
      ]
    }
  }
}

How many replicas were configured when you had 5 primary shards? Did you at this time also have 4 nodes?

I had 2 replicas when i had 5 primary shards.

No i had 3 nodes at the time.

Right now the indices are as follows:

5 primary shard index has 2 replicas across 4 nodes
30 primary shard index has 3 replicas across 4 nodes

Could this have to do anything with elasticsearch running in a kubernetes environment?
I'm using ECK to provision the elasticsearch cluster

I switched to a 3 node cluster with 16CPU and 60RAM where all the nodes are master and data nodes just so see if it is having any effects.

To my surprise it looks like now the queries are only being handled by only one node at a time regardless of the index I am choosing.

I don't really know what to search in regards to this, or which parts of the documentation to refer to weather that be some sort of cluster setting or index setting that may be effecting this.

Could the fact that the cluster does not have any dedicated master nodes have anything to do with this? Or maybe it needs dedicated client nodes to distribute the search between shards on different nodes?

In any case i will try and experiments with things like this to see if maybe that is the problem.

Recent versions of Elasticsearch has adaptive replica selection enabled by default. It is possible this a low query volumes favours local shards but I do not know the internals. You can control this by d as pacifying a preference if you want to and see if this makes a difference.

Optimising for processing a single query at a time may not being optimal if you in real life expect a higher number of concurrent queries.

Thank you for the insights.

I tried disabling the adaptive replica selection and using the preference parameter to set the nodes.
That worked, however if i do not set a preference where i put all the nodes in the _only_nodes value the same thing is happening.

Is this the intended behavior of the adaptive replica selection though? I would assume that the option was enabled for better performance and not let the user select which nodes to be selected (I can understand the times where that would be useful though)

Whichever setting i put in the adaptive replica selection (true or false) the query only runs on a single node unless a preference value with _only_nodes:<node1>,<node2>,.. is specified.

It's now happening for every index that i run the query.

I did a little experiment where I set the number of replicas to 0.

Now the query is being processed on all nodes since no single node has all the shards to to process the query.