Is there a way to specify _routing=all or similar for GET requests


(Yehosef) #1

I opened an issue about this at https://github.com/elastic/elasticsearch/issues/28120 but I haven't heard anything. Basically I know when accessing the documents I can specific a _routing param and I know that I can specify multiple values.

But I want to know if there a way to tell it to check all the shards. Is that possible?

Given that I can already check multiple shards, I would think I should be able to specify to choose all of them.


(Luiz Santos) #2

Hi @yehosef,

I read the issue you opened and I think you misunderstood how routing works. Please, let me know if I'm wrong.

By default, you don't have to specify the routing field to index a document. Elasticsearch will wisely determine in each primary shard your document will live by doing simple math:

shard = hash(routing) % number_of_primary_shards

where routing by default is _id of the document.

If you don't want to change this default behaviour all you have to do is get the document, let's say:

#index without routing parameter (default)
POST my_index/doc/1
{
  "field": "value"
}

GET /index/doc/1

Elasticsearch will do the simple math again, determine in each shard the document with id = 1 was indexed and will retrieve it. It's not necessary to search all shards as it is guaranteed that it resides in only one primary shard.

If you have changed the default behaviour by specifying the routing parameter when indexing a document, Elasticsearch assumes that when you will get this document you must know what was the routing. Example:

POST my_index/doc/1?routing=asdf
{
  "field": "value"
}

# not found
GET my_index/doc/1

{
  "_index": "my_index",
  "_type": "doc",
  "_id": "1",
  "found": false
}

# found!
GET my_index/doc/1?routing=asdf

{
  "_index": "my_index",
  "_type": "doc",
  "_id": "1",
  "_version": 1,
  "_routing": "asdf",
  "found": true,
  "_source": {
    "field": "value"
  }
}

Hope it helps!

Cheers,
LG


(Yehosef) #3

Hi - thanks for the reply.

The case I'm dealing with is when I want to find the document using the REST API approach but I don't know the routing element.

I'll use the example of the case I'm dealing with. Let's say that I have articles and comments and I want the comments to be children of the articles. I need to specify the routing of the comment according to the it's parent (the article). In my example below I have a article with id of 1 and a comment with id of 2.

Let's say that someone likes the comment and I want to send a notification to the comment author and I want to find out who that is and the other details of the comment for the notification. So I need to query the comment but
GET /content/doc/2
isn't (might not) work because it's on the shard of 1. If instead I did
GET /content/doc/2?routing=1
then it would work (and you can even put in multiple routing ids).

But what if the time I'm making this request, I don't know the parent of comment 2. I need to get the comment and then I know the parent, then I can use the routing number. It's a catch-22.

I could do a query
GET content/doc/_search?q=id:2
and that will get me the document - but it's not in the same minimal REST format and it makes a different way of reading and writing the data if I'm using the REST API.

Since I can already ask for
GET content/doc/2?routing=1,2,3,4,5
(even though I'm not guaranteed of finding the results based on the hash and the number of shards.)

I'm asking/suggesting that there should be a way to do
GET content/doc/2?routing=*
that will query all the shards the same way as if I had asked for all the possible routing values to hit all the shards.

Does that help?

Thanks!


(Luiz Santos) #4

Hi @yehosef,

Thank you for elaborating.

The routing=* may solve your problem. But I still think it's a data design issue. For example, is it possible to change how you index comments so they have the article ID? (just an idea)

Cheers,
LG


(Yehosef) #5

I don't think its a data-design issue - it's completely normal that I have an item which has a relationship/parent which and I don't want to couple the document id. If you look at the examples in parent-child documents - they are all designed like this.

What you're suggesting would leave the comment with an id like "1-2" but then it's gets really messy - what's the "2"? Should I just start counting children documents from the parent (first comment 1-1, second comment 1-2, etc) - but then how do I auto-increment.. etc.

Like I said, I can already effectively force the routing solution by choosing ids for the routing parameter that match all the possible shards. Eg. if I have three shards and I figure out that the ids "absd", "sdfsg", and "sagadasdfA" will match the three shards - I can use those as the routing option to the GET request and it will work. But that's really hacky and brittle. It would be more nice if I could use something like routing=* to handle that for me.


(Luiz Santos) #6

Hi @yehosef,

Ok! So let's wait if someone could guide us to a better solution, and for comments in the issue you opened.
Hope you find a solution soon!

Cheers,
LG


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.