Data node client node interaction

We use separate data and client nodes in our cluster, all reads go through the client nodes. So client nodes act as co-ordinating nodes in our case. Now for this given setup I have couple of questions in my mind,

  • What happens to the requests in the search queue of the data nodes if the client node that had forwarded those requests goes out of the cluster due to OOM ?

    Will those request still get executed, even when the client node (also the co-ordinating node) is not there to collect and merge the outputs from the data nodes.

    Will some other client node take the place of the client node that went OOM and do the merge.

  • In the second case, what happens if the data node that had the request forwarded from a client node goes OOM.

    Will the client node detect that the data node is not responding and send the request to some other data node having a replica of this shard ?

    Will it infinitely wait for the data node that has gone OOM to recover back & return result ? or will it collect the result from other data nodes (ignoring this node that went OOM) and merge the result.

I'm assuming an environment where there are multiple shards, with two replicas.

No, queries are stateful only on the originating node.
An educated guess would be that the queries are just discarded after they have run and the client will need to query again.

I'll see if I can get an official answer though, cause it's hard to know everything ES does :slight_smile:

Thanks @warkolm for your answer, I'm eagerly looking forward to get an official answer.

Meanwhile, just to clarify few things about that "state-fulness" of the queries I had few questions,

  • Since state is only maintained on the originating node, so if this node itself goes OOM the computation done by the data nodes will be a waste as there is no node to gather the responses sent by these data nodes, right ?

  • Since data nodes are not aware of any state of the query, they will execute all the queries in their search queue even if the node that had initiated that query is no more part of the cluster ?

  • If the originating node maintains the state of the queries then if a data node goes OOM (which was executing a query/had a query in it's queue scattered by the originating node) then the originating node won't get the result back from this data node. In this case will originating node retry it with some other data node or wait for the data node gone OOM to join back and again resend the request ?

Did you get a chance to check on this one ?

This is correct.

Yes, the coordinating node will try sending the request to other copies of the shard until all shard copies are exhausted without successfully getting a response to the search request. Even in that case, partial results will be returned from all the shards that successfully returned a response to the query.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.