Requests Containing an Array of Custom Routing Values

Hi Guys,
We have a cluster partitioned by a certain key for which we are using custom routing. We now are having a use-case where we have to request data across different key with the key list to be large. So wanted to understand how is the custom routing query processed? I know when we specify a list of routing keys, elastic search goes to all the shards mapping those sets of routing keys and fetches the data. I have a couple of questions around that

  • Lets say the custom routing parameter array contains 1000 values. There are only 10 shards though. Will there be 1000 request internally with one for each routing or at max there will be 10 request with one request per shard.
  • When returning data with query containing multiple routing values, does elasticsearch paginate the requests (sort request across all partitions) and return the requested page or it returns data partitioned by routing key.

We aren't all guys :slight_smile:

Requests are sent per shard. So you essentially end up with one bulk request with 100 individual requests per shard.

Elasticsearch will just look at it as one response, irrespective of the routing. I don't know if there is a way to change that though.

Thanks @warkolm for your response.

My sincere apologies

So in the above example assuming that the expected response to be top 25 records

  • The co-ordinating shard will query all the 10 shards with each one returning 25 records.
  • Then the co-ordinating shard will sort the 250 records and return the top 25 of them.
    Is the above flow correct?

Yes, it's the same as any other query.

Is there a way to specify in the query to tell final shard to query and filter for only the list of values that it was resolved to based on the custom routing provided.

For example, lets say the cluster is partitioned by attribute A which is also present in the document.

  • We have a query to get the top 25 documents where document's attribute A value should be among (A1, A2, .... A50).
  • We have 10 shards such that shard S1 contains A1.. A5 and shard S2 contains A6 to A10 and so on.

Is there a way to query such that

  • Shard S1 queries for A1 to A5 in its index and not from A6 to A50. Similarly S2 queries only for A6 to A10 and so on.
    This way shard S1 doesn't need to processes values that it doesn't contain and serves the query faster.

If you want to do that then you should use filters, that way it will use bitsets which are super efficient.

Did you mean Bool Query | Elasticsearch Guide [6.5] | Elastic
or Terms query | Elasticsearch Guide [8.11] | Elastic

Couldn't find any documentation stating terms supported under filter?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.