Custom routing and index aliases


(Matt Preston) #1

Hi,

I'm doing some experiments with custom routing and running into a few
problems. I've hit the "hotspots" issue as described in the documentation,
where one routing value matches far too many documents - millions. The
documentation suggests separating these documents into their own index and
then to use an alias to make the separation transparent. I have 2 questions
about this.

  1.  What if the route value matches too many documents for a single
    

shard? Is there any way to use multiple shards in the second index even
though it is identified by a single route? Otherwise loading all the
documents with the same route value causes them to be loaded to the same
shard, regardless of how many shards I configure for the index.

  1.  Is it possible to combine routing and an alias that points to more
    

than 1 index?

For example using 2 indexes, index1 (3 shards) containing route values "1",
"2" & "3" and index2 (1 shard) containing route value "4". I created an
alias like this:

curl -XPOST 'http://localhost:9200/_aliases' -d '

{

"actions" : [

    { "add" : { "index" : "index1", "alias" : "alias", "search_routing"

: "1,2,3" } },

    { "add" : { "index" : "index2", "alias" : "alias",  "search_routing"

: "4"} }

]

}'

I was expecting that when searching the alias using a single route value
only a single shard would get hit, but that doesn't seem to be the case.

This query hits 2 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=1'

This query hits 4 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=4'

It appears that search requests are being sent to both indexes, ignoring the
"search_routing" parameter in the alias definition. Am I doing something
wrong?

Thanks,

Matt


(David Pilato) #2

You can not use alias when indexing.
A routing key is used by elasticsearch to determine which shards are going to be used to run your query (or your get operation).

So create an index index1 and an index index2.

Then index your first documents to index1 using routing key as you want. index1 can have as many shards as needed.
When a shard is "full", index your new documents to index2 using the same routing key as before.

Alias on top of index1 and index2 will be used when searching. You can add routing key when searching on alias (which is basically the same as searching in index1,index2)

Does it help to understand how alias and routing work?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 27 novembre 2013 at 14:46:03, matthew.preston@thomsonreuters.com (matthew.preston@thomsonreuters.com) a écrit:

Hi,

I’m doing some experiments with custom routing and running into a few problems. I’ve hit the “hotspots” issue as described in the documentation, where one routing value matches far too many documents – millions. The documentation suggests separating these documents into their own index and then to use an alias to make the separation transparent. I have 2 questions about this.

  1.  What if the route value matches too many documents for a single shard?  Is there any way to use multiple shards in the second index even though it is identified by a single route?  Otherwise loading all the documents with the same route value causes them to be loaded to the same shard, regardless of how many shards I configure for the index.
    
  2.  Is it possible to combine routing and an alias that points to more than 1 index?
    

For example using 2 indexes, index1 (3 shards) containing route values “1”, “2” & “3” and index2 (1 shard) containing route value “4”. I created an alias like this:

curl -XPOST 'http://localhost:9200/_aliases' -d '

{

"actions" : [

    { "add" : { "index" : "index1", "alias" : "alias", “search_routing” : “1,2,3” } },

    { "add" : { "index" : "index2", "alias" : "alias",  “search_routing” : “4”} }

]

}'

I was expecting that when searching the alias using a single route value only a single shard would get hit, but that doesn’t seem to be the case.

This query hits 2 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=1'

This query hits 4 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=4'

It appears that search requests are being sent to both indexes, ignoring the “search_routing” parameter in the alias definition. Am I doing something wrong?

Thanks,

Matt

  • smime.p7s, 7 KB

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52960422.7fdcc233.3e14%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Preston) #3

Hi,

So, based on what you’re saying, using my earlier example: I want index1
containing to contain route values “1”, “2” & “3” and index2 to contain
route value “4” only. When indexing I calculate the route value for each
document, if it’s in [1, 2, 3], I index to index1 (with routing), else if
it’s 4, I index to index2 (without routing).

The remaining problem I have is that I cannot query an alias for both
indexes at the same time, using routing parameters to hit only the routed
shards.

I create the alias like this:

curl -XPOST 'http://localhost:9200/_aliases' -d '

{

"actions" : [

    { "add" : { "index" : "index1", "alias" : "alias", “search_routing”

: “1,2,3” } },

    { "add" : { "index" : "index2", "alias" : "alias",  “search_routing”

: “4”} }

]

}'

For arguments sake, let’s say index1 is composed of 10 shards and index2 is
2 shards.

In this query, I’d expect only a single shard to get hit (from index1) as I
have passed “1” as the routing parameter, but according to the response it
actually hit 3 shards.

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=1'

In this query, I’d expect 2 shards to get hit, the 2 shards from index2,
because I passed “4” as the routing parameter, but the response says it hit
11 shards.

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=19'

How can I configure the alias so that the queries get routed to the right
shards? Am I really allowed to make an alias over two indexes, one that
uses custom routing and one that doesn’t?

Thanks,

Matt

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
On Behalf Of David Pilato
Sent: 27 November 2013 14:40
To: elasticsearch@googlegroups.com
Subject: Re: Custom routing and index aliases

You can not use alias when indexing.

A routing key is used by elasticsearch to determine which shards are going
to be used to run your query (or your get operation).

So create an index index1 and an index index2.

Then index your first documents to index1 using routing key as you want.
index1 can have as many shards as needed.

When a shard is "full", index your new documents to index2 using the same
routing key as before.

Alias on top of index1 and index2 will be used when searching. You can add
routing key when searching on alias (which is basically the same as
searching in index1,index2)

Does it help to understand how alias and routing work?

--

David Pilato | Technical Advocate | Elasticsearch.com

https://twitter.com/dadoonet @dadoonet |
https://twitter.com/elasticsearchfr @elasticsearchfr

Le 27 novembre 2013 at 14:46:03, matthew.preston@thomsonreuters.com
(matthew.preston@thomsonreuters.com) a écrit:

Hi,

I’m doing some experiments with custom routing and running into a few
problems. I’ve hit the “hotspots” issue as described in the documentation,
where one routing value matches far too many documents – millions. The
documentation suggests separating these documents into their own index and
then to use an alias to make the separation transparent. I have 2 questions
about this.

  1.  What if the route value matches too many documents for a single
    

shard? Is there any way to use multiple shards in the second index even
though it is identified by a single route? Otherwise loading all the
documents with the same route value causes them to be loaded to the same
shard, regardless of how many shards I configure for the index.

  1.  Is it possible to combine routing and an alias that points to more
    

than 1 index?

For example using 2 indexes, index1 (3 shards) containing route values “1”,
“2” & “3” and index2 (1 shard) containing route value “4”. I created an
alias like this:

curl -XPOST 'http://localhost:9200/_aliases' -d '

{

"actions" : [

    { "add" : { "index" : "index1", "alias" : "alias", “search_routing”

: “1,2,3” } },

    { "add" : { "index" : "index2", "alias" : "alias",  “search_routing”

: “4”} }

]

}'

I was expecting that when searching the alias using a single route value
only a single shard would get hit, but that doesn’t seem to be the case.

This query hits 2 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=1'

This query hits 4 shards

curl -XGET 'http://localhost:9200/alias/_search?q=text:foo&routing=4'

It appears that search requests are being sent to both indexes, ignoring the
“search_routing” parameter in the alias definition. Am I doing something
wrong?

Thanks,

Matt


  • smime.p7s, 7 KB

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.52960422.7fdcc233.3e14
%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4