Aging strategy described in the ES website's "Retiring Data" document isn't working for me

Greetings,

I'm setting up a fresh cluster of two machines on AWS to serve as a ELK log
server stack. I've imported 4 months worth of log data to play with on
the new cluster.

I'm trying to reproduce the behaviors explained in the "Retiring Data"
document on the ES website:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html

Specifically, I'm trying to cause indexes to move from the primary machine
to the secondary machine. I've set the node.box_type setting via the
elasticsearch.yml file to "recent" on the primary machine and "older" on
the secondary machine.

To migrate an index from the primary machine ("recent") in the cluster to
the secondary machine ("older"), I'm executing the following set of API
calls:

curl -XPOST http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_close
curl -XPUT http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_settings -d
'{
"index" : {"index.routing.allocation.include.box_type" : "older"}}'
curl -XPOST http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_open

When I do this, only 1 of the 5 shards swaps the positions of the live and
backup copies such that the live copy is then on the secondary machine. So
I get 4 shards live on "recent" and one live on "older". What's more, if I
bring up just the secondary machine initially and let the indexes rebuild
so that all 5 live shards of each index are on the secondary machine, then
bring up the primary machine, and then execute the above calls, 4 of the 5
shards swap so that as with the first case, only one of the shards ends up
with its live copy on the secondary machine.

Here's a picture of what's going on from the elasticsearch-head console app:

https://lh4.googleusercontent.com/-e8w8Q0xzTHA/VJ9fG2tVgQI/AAAAAAAAAkg/KuRihe5KcwU/s1600/ShardState.tiff

I ran the above calls on each of the first two displayed indexes. Note
that the one shard that moves is different in the two cases. The third
displayed index is the "before" state, an index that I have not yet tried
to migrate. I've confirmed the state of each shard using the API just to
be sure that isn't just the console app that's screwing up.

Here are the important bits of the settings of the two machines

primary:

{
"name": "es1-vpc3",
...
"attributes": {
"box_type": "recent"
},
...
"cluster": {
"name": "es-vpc3"
},
"node": {
"name": "es1-vpc3",
"box_type": "recent"
},
....
}

secondary:

{
"name": "es2-vpc3",
...
"attributes": {
"box_type": "older"
},
...
"cluster": {
"name": "es-vpc3"
},
"node": {
"name": "es2-vpc3",
"box_type": "older"
},
....
}

What am I missing? Is this a bug?

TIA for any help you can provide.

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccbf327b-5651-4750-af22-4bc4612940d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You don't need to close the index to assign a tag, just assign it
dynamically.

But you have replica's so there will always be shards assigned to both
nodes. You need to drop the replica count to 0 for this work the way you
are looking for.

On 28 December 2014 at 12:40, Steve Johnson blatwurst@gmail.com wrote:

Greetings,

I'm setting up a fresh cluster of two machines on AWS to serve as a ELK
log server stack. I've imported 4 months worth of log data to play with
on the new cluster.

I'm trying to reproduce the behaviors explained in the "Retiring Data"
document on the ES website:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html

Specifically, I'm trying to cause indexes to move from the primary machine
to the secondary machine. I've set the node.box_type setting via the
elasticsearch.yml file to "recent" on the primary machine and "older" on
the secondary machine.

To migrate an index from the primary machine ("recent") in the cluster to
the secondary machine ("older"), I'm executing the following set of API
calls:

curl -XPOST http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_close
curl -XPUT http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_settings
-d '{
"index" : {"index.routing.allocation.include.box_type" : "older"}}'
curl -XPOST http://es1.vpc3.ftaws.net:9200/logstash-2014.08.02/_open

When I do this, only 1 of the 5 shards swaps the positions of the live and
backup copies such that the live copy is then on the secondary machine. So
I get 4 shards live on "recent" and one live on "older". What's more, if I
bring up just the secondary machine initially and let the indexes rebuild
so that all 5 live shards of each index are on the secondary machine, then
bring up the primary machine, and then execute the above calls, 4 of the 5
shards swap so that as with the first case, only one of the shards ends up
with its live copy on the secondary machine.

Here's a picture of what's going on from the elasticsearch-head console
app:

https://lh4.googleusercontent.com/-e8w8Q0xzTHA/VJ9fG2tVgQI/AAAAAAAAAkg/KuRihe5KcwU/s1600/ShardState.tiff

I ran the above calls on each of the first two displayed indexes. Note
that the one shard that moves is different in the two cases. The third
displayed index is the "before" state, an index that I have not yet tried
to migrate. I've confirmed the state of each shard using the API just to
be sure that isn't just the console app that's screwing up.

Here are the important bits of the settings of the two machines

primary:

{
"name": "es1-vpc3",
...
"attributes": {
"box_type": "recent"
},
...
"cluster": {
"name": "es-vpc3"
},
"node": {
"name": "es1-vpc3",
"box_type": "recent"
},
....
}

secondary:

{
"name": "es2-vpc3",
...
"attributes": {
"box_type": "older"
},
...
"cluster": {
"name": "es-vpc3"
},
"node": {
"name": "es2-vpc3",
"box_type": "older"
},
....
}

What am I missing? Is this a bug?

TIA for any help you can provide.

Steve

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccbf327b-5651-4750-af22-4bc4612940d4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ccbf327b-5651-4750-af22-4bc4612940d4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Bpa9yDJmSsJRgy3By52%2BqSOeB3L3Y6tTPwYA9qn%2BkxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Mark for pointing out the first part about not needing the
close/open. I must have picked those up while fooling around trying to
affect the correct behavior. I originally followed the post I mentioned,
which did not have the close/open.

For the second point....I want a replica of each shard. The issue is where
the active vs backup shard lives. Notice the shard boxes in my image with
darker outlines vs those without. The issue is illustrated in the first
two indexes by which of each pair of shards is outlined.

Thanks for your input!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5b88cf1-7d72-4f8c-b850-6c51be7be4ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It doesn't matter where the primary or replica's live, they are the same
thing.

If you only want to query the second node then send your queries to it and
use local preference -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html

On 29 December 2014 at 02:08, Steve Johnson blatwurst@gmail.com wrote:

Thanks Mark for pointing out the first part about not needing the
close/open. I must have picked those up while fooling around trying to
affect the correct behavior. I originally followed the post I mentioned,
which did not have the close/open.

For the second point....I want a replica of each shard. The issue is
where the active vs backup shard lives. Notice the shard boxes in my image
with darker outlines vs those without. The issue is illustrated in the
first two indexes by which of each pair of shards is outlined.

Thanks for your input!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5b88cf1-7d72-4f8c-b850-6c51be7be4ab%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a5b88cf1-7d72-4f8c-b850-6c51be7be4ab%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_FtASH_Sp0KCDcWno7BhRa9TByiSq9WyRUEO4dzG1c%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

Respectively, your comments go against what the document I’m working from on the ES site suggests. I want to be able to query all the indexes in the cluster in a single query. The point here is to spread the load of dealing with particular indexes across the cluster’s nodes such that the most usual case of making queries against the last few days’ data is optimized.

Here is an encapsulation of the part of that document:
migrate old indicesedit https://github.com/elasticsearch/elasticsearch-definitive-guide/edit/master/410_Scaling/55_Retiring_data.asciidoc
With logging data there is likely to be one “hot” index — the index for today. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.1 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.2All new documents will be added to that index, and almost all queries will target that index. It should use your best hardware.

...

We can ensure that today’s index is on our strongest boxes by creating it with the following settings:

PUT /logs_2014-10-01
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}
Yesterday’s index no longer needs to be on our strongest boxes, so we can move it to the nodes tagged as medium by updating its index settings:

POST /logs_2014-09-30/_settings
{
"index.routing.allocation.include.box_type" : "medium"
}

This discussion is talking in terms of cluster management, and speaks not at all about having to change which IP the users of YS have to connect to. What’s said in the first paragraph: "almost all queries will target that index” suggests that the index being moved to the second machine will sometimes be the target of the same query source that is usually targeting the newer indexes on the primary machine. Besides this documentation, my own experience says that one box in a YS cluster usually serves as the front end for the entire cluster despite the fact that it may be asked to query indexes on other members in the cluster. If this were not true, what use would there be in clustering YS beyond basic redundancy?

The whole point of this section is controlling which node actually hosts a particular index. If it didn’t matter which node had the primary copy of an index and which had the backup (or if there were no distinction), then this whole section would make no sense in the face of redundancy, and the fact that the elasticsearch-head console makes a point of distinguishing the two visually would be useless. Said yet another way, what good is the “primary” field in an index’s metadata, which is always “true” in exactly 1 copy of that index, if there is no distinction between particular copies of the same index in a cluster.

Again, I’m trying to make sense of this documentation specifically. I want to be able to reduce load on my primary YS node by migrating indexes to other nodes in the cluster. I’m looking for someone to tell me what I’m doing wrong in either interpreting or executing what is explained in this doc, or to tell me that what I’m seeing is a bug in the software.

If anyone else has any advice on this, I’d greatly appreciate it.

Steve

On Dec 29, 2014, at 1:18 PM, Mark Walkom markwalkom@gmail.com wrote:

It doesn't matter where the primary or replica's live, they are the same thing.

If you only want to query the second node then send your queries to it and use local preference - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html

On 29 December 2014 at 02:08, Steve Johnson <blatwurst@gmail.com mailto:blatwurst@gmail.com> wrote:
Thanks Mark for pointing out the first part about not needing the close/open. I must have picked those up while fooling around trying to affect the correct behavior. I originally followed the post I mentioned, which did not have the close/open.

For the second point....I want a replica of each shard. The issue is where the active vs backup shard lives. Notice the shard boxes in my image with darker outlines vs those without. The issue is illustrated in the first two indexes by which of each pair of shards is outlined.

Thanks for your input!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5b88cf1-7d72-4f8c-b850-6c51be7be4ab%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a5b88cf1-7d72-4f8c-b850-6c51be7be4ab%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/E1yTc1xsSl4/unsubscribe https://groups.google.com/d/topic/elasticsearch/E1yTc1xsSl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_FtASH_Sp0KCDcWno7BhRa9TByiSq9WyRUEO4dzG1c%3DA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_FtASH_Sp0KCDcWno7BhRa9TByiSq9WyRUEO4dzG1c%3DA%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B1DE6377-BC26-4463-81E3-F60B2DDD7AF0%40parisgroup.net.
For more options, visit https://groups.google.com/d/optout.

Ooops. "Respectively"? I meant "Respectfully".

On Wednesday, December 31, 2014 12:01:30 PM UTC-8, Steve Johnson wrote:

Hi Mark,

Respectively, your comments go against what the document I’m working from
on the ES site suggests. I want ....

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7bb0d9e3-ec11-424e-9d95-962751b4bf18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

Respectfully, your comments go against what the document I’m working from on the ES site suggests. I want to be able to query all the indexes in the cluster in a single query. The point here is to spread the load of dealing with particular indexes across the cluster’s nodes such that the most usual case of making queries against the last few days’ data is optimized.

Here is an encapsulation of the part of that document:
migrate old indicesedit https://github.com/elasticsearch/elasticsearch-definitive-guide/edit/master/410_Scaling/55_Retiring_data.asciidoc
With logging data there is likely to be one “hot” index — the index for today. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.1 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.2All new documents will be added to that index, and almost all queries will target that index. It should use your best hardware.

...

We can ensure that today’s index is on our strongest boxes by creating it with the following settings:

PUT /logs_2014-10-01
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}
Yesterday’s index no longer needs to be on our strongest boxes, so we can move it to the nodes tagged as medium by updating its index settings:

POST /logs_2014-09-30/_settings
{
"index.routing.allocation.include.box_type" : "medium"
}

This discussion is talking in terms of cluster management, and speaks not at all about having to change which IP the users of YS have to connect to. What’s said in the first paragraph: "almost all queries will target that index” suggests that the index being moved to the second machine will sometimes be the target of the same query source that is usually targeting the newer indexes on the primary machine. Besides this documentation, my own experience says that one box in a YS cluster usually serves as the front end for the entire cluster despite the fact that it may be asked to query indexes on other members in the cluster. If this were not true, what use would there be in clustering YS beyond basic redundancy?

The whole point of this section is controlling which node actually hosts a particular index. If it didn’t matter which node had the primary copy of an index and which had the backup (or if there were no distinction), then this whole section would make no sense in the face of redundancy, and the fact that the elasticsearch-head console makes a point of distinguishing the two visually would be useless. Said yet another way, what good is the “primary” field in an index’s metadata, which is always “true” in exactly 1 copy of that index, if there is no distinction between particular copies of the same index in a cluster.

Again, I’m trying to make sense of this documentation specifically. I want to be able to reduce load on my primary YS node by migrating indexes to other nodes in the cluster. I’m looking for someone to tell me what I’m doing wrong in either interpreting or executing what is explained in this doc, or to tell me that what I’m seeing is a bug in the software.

If anyone else has any advice on this, I’d greatly appreciate it.

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/711595B2-E3BC-4565-A258-76F61A2890AF%40parisgroup.net.
For more options, visit https://groups.google.com/d/optout.

Except the docs there make the assumption that you have multiples of each
node type (ie strong/medium) to spread the data across. That is the key
thing, it's not just one of each!

Unless you remove the replica or add more nodes, you will always have data
on both nodes. However you should probably just look at adding more nodes
in general as you'll eventually reach a point where even your "non-primary"
nodes are over loaded.

On 1 January 2015 at 07:08, Steve Johnson steve@parisgroup.net wrote:

Hi Mark,

Respectfully, your comments go against what the document I’m working from
on the ES site suggests. I want to be able to query all the indexes in the
cluster in a single query. The point here is to spread the load of dealing
with particular indexes across the cluster’s nodes such that the most usual
case of making queries against the last few days’ data is optimized.

Here is an encapsulation of the part of that document:

migrate old indicesedit
https://github.com/elasticsearch/elasticsearch-definitive-guide/edit/master/410_Scaling/55_Retiring_data.asciidoc

With logging data there is likely to be one “hot” index — the index for
today.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.1

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.2All
new documents will be added to that index, and almost all queries will
target that index. It should use your best hardware.

...

We can ensure that today’s index is on our strongest boxes by creating it
with the following settings:

PUT /logs_2014-10-01
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}

Yesterday’s index no longer needs to be on our strongest boxes, so we can
move it to the nodes tagged as medium by updating its index settings:

POST /logs_2014-09-30/_settings
{
"index.routing.allocation.include.box_type" : "medium"
}

This discussion is talking in terms of cluster management, and speaks not
at all about having to change which IP the users of YS have to connect to.
What’s said in the first paragraph: "almost all queries will target
that index” suggests that the index being moved to the second machine will
sometimes be the target of the same query source that is usually targeting
the newer indexes on the primary machine. Besides this documentation, my
own experience says that one box in a YS cluster usually serves as the
front end for the entire cluster despite the fact that it may be asked to
query indexes on other members in the cluster. If this were not true, what
use would there be in clustering YS beyond basic redundancy?

The whole point of this section is controlling which node actually hosts a
particular index. If it didn’t matter which node had the primary copy of
an index and which had the backup (or if there were no distinction), then
this whole section would make no sense in the face of redundancy, and the
fact that the elasticsearch-head console makes a point of distinguishing
the two visually would be useless. Said yet another way, what good is the
“primary” field in an index’s metadata, which is always “true” in exactly 1
copy of that index, if there is no distinction between particular copies of
the same index in a cluster.

Again, I’m trying to make sense of this documentation specifically. I
want to be able to reduce load on my primary YS node by migrating indexes
to other nodes in the cluster. I’m looking for someone to tell me what I’m
doing wrong in either interpreting or executing what is explained in this
doc, or to tell me that what I’m seeing is a bug in the software.

If anyone else has any advice on this, I’d greatly appreciate it.

Steve

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/711595B2-E3BC-4565-A258-76F61A2890AF%40parisgroup.net
https://groups.google.com/d/msgid/elasticsearch/711595B2-E3BC-4565-A258-76F61A2890AF%40parisgroup.net?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_CkdNPin%3DTsPr-ZHHGC1LU5CKbHAtxXbQCvU9FJHYMpg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dude!

Thanks for sticking with me on this one! With your recent comments, I turned off replication and then tried my tests again. I thought I’d tried this before, but I must have botched it the first time. With replication off, I now get the behavior I’m looking for. Thanks so much!

So I now see that maybe what you said in your prior response was correct. If when you said "It doesn't matter where the primary or replica's live, they are the same thing”, you meant this only from the standpoint of how YS decides which node each shard lives on and not how they are actually used, then we are now totally in sync. My mistake was in thinking that the 'index.routing.allocation.include” property didn’t apply to replica shards, but obviously it does. So if the cluster can’t satisfy that directive because there aren’t enough “older” machines to hold both the primary and replica shards, it has to use a “recent” machine for one of them, and it decides which arbitrarily. That makes sense, if not being the behavior I would prefer (why not always put the primary on the requested machine?). But I can live with this.

So my problem was indeed a bad assumption on my part. Thanks again and have a great New Year!

Steve

On Dec 31, 2014, at 7:00 PM, Mark Walkom markwalkom@gmail.com wrote:

Except the docs there make the assumption that you have multiples of each node type (ie strong/medium) to spread the data across. That is the key thing, it's not just one of each!

Unless you remove the replica or add more nodes, you will always have data on both nodes. However you should probably just look at adding more nodes in general as you'll eventually reach a point where even your "non-primary" nodes are over loaded.

On 1 January 2015 at 07:08, Steve Johnson <steve@parisgroup.net mailto:steve@parisgroup.net> wrote:
Hi Mark,

Respectfully, your comments go against what the document I’m working from on the ES site suggests. I want to be able to query all the indexes in the cluster in a single query. The point here is to spread the load of dealing with particular indexes across the cluster’s nodes such that the most usual case of making queries against the last few days’ data is optimized.

Here is an encapsulation of the part of that document:
migrate old indicesedit https://github.com/elasticsearch/elasticsearch-definitive-guide/edit/master/410_Scaling/55_Retiring_data.asciidoc
With logging data there is likely to be one “hot” index — the index for today. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.1 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/retiring-data.html#id-1.8.6.15.7.2.2All new documents will be added to that index, and almost all queries will target that index. It should use your best hardware.

...

We can ensure that today’s index is on our strongest boxes by creating it with the following settings:

PUT /logs_2014-10-01
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}
Yesterday’s index no longer needs to be on our strongest boxes, so we can move it to the nodes tagged as medium by updating its index settings:

POST /logs_2014-09-30/_settings
{
"index.routing.allocation.include.box_type" : "medium"
}

This discussion is talking in terms of cluster management, and speaks not at all about having to change which IP the users of YS have to connect to. What’s said in the first paragraph: "almost all queries will target that index” suggests that the index being moved to the second machine will sometimes be the target of the same query source that is usually targeting the newer indexes on the primary machine. Besides this documentation, my own experience says that one box in a YS cluster usually serves as the front end for the entire cluster despite the fact that it may be asked to query indexes on other members in the cluster. If this were not true, what use would there be in clustering YS beyond basic redundancy?

The whole point of this section is controlling which node actually hosts a particular index. If it didn’t matter which node had the primary copy of an index and which had the backup (or if there were no distinction), then this whole section would make no sense in the face of redundancy, and the fact that the elasticsearch-head console makes a point of distinguishing the two visually would be useless. Said yet another way, what good is the “primary” field in an index’s metadata, which is always “true” in exactly 1 copy of that index, if there is no distinction between particular copies of the same index in a cluster.

Again, I’m trying to make sense of this documentation specifically. I want to be able to reduce load on my primary YS node by migrating indexes to other nodes in the cluster. I’m looking for someone to tell me what I’m doing wrong in either interpreting or executing what is explained in this doc, or to tell me that what I’m seeing is a bug in the software.

If anyone else has any advice on this, I’d greatly appreciate it.

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/711595B2-E3BC-4565-A258-76F61A2890AF%40parisgroup.net https://groups.google.com/d/msgid/elasticsearch/711595B2-E3BC-4565-A258-76F61A2890AF%40parisgroup.net?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/E1yTc1xsSl4/unsubscribe https://groups.google.com/d/topic/elasticsearch/E1yTc1xsSl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_CkdNPin%3DTsPr-ZHHGC1LU5CKbHAtxXbQCvU9FJHYMpg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_CkdNPin%3DTsPr-ZHHGC1LU5CKbHAtxXbQCvU9FJHYMpg%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/792E4234-51AC-4244-A22E-904C6F319473%40parisgroup.net.
For more options, visit https://groups.google.com/d/optout.

Dude!

Thanks for sticking with me on this one! With your recent comments, I turned off replication and then tried my tests again. I thought I’d tried this before, but I must have botched it the first time. With replication off, I now get the behavior I’m looking for. Thanks so much!

So I now see that maybe what you said in your prior response was correct. If when you said "It doesn't matter where the primary or replica's live, they are the same thing”, you meant this only from the standpoint of how YS decides which node each shard lives on and not how they are actually used, then we are now totally in sync. My mistake was in thinking that the 'index.routing.allocation.include” property didn’t apply to replica shards, but obviously it does. So if the cluster can’t satisfy that directive because there aren’t enough “older” machines to hold both the primary and replica shards, it has to use a “recent” machine for one of them, and it decides which arbitrarily. That makes sense, if not being the behavior I would prefer (why not always put the primary on the requested machine?). But I can live with this.

So my problem was indeed a bad assumption on my part. Thanks again and have a great New Year!

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2D307964-F16B-44AB-9B42-E92701F9D9D2%40parisgroup.net.
For more options, visit https://groups.google.com/d/optout.

Good to hear :slight_smile:

Also, you seemed to type YS instead of ES, not sure if that is a typo or
not!?

On 2 January 2015 at 06:46, Steve Johnson steve@parisgroup.net wrote:

Dude!

Thanks for sticking with me on this one! With your recent comments, I
turned off replication and then tried my tests again. I thought I’d tried
this before, but I must have botched it the first time. With replication
off, I now get the behavior I’m looking for. Thanks so much!

So I now see that maybe what you said in your prior response was correct.
If when you said "It doesn't matter where the primary or replica's live,
they are the same thing”, you meant this only from the standpoint of how YS
decides which node each shard lives on and not how they are actually used,
then we are now totally in sync. My mistake was in thinking that the
'index.routing.allocation.include” property didn’t apply to replica shards,
but obviously it does. So if the cluster can’t satisfy that directive
because there aren’t enough “older” machines to hold both the primary and
replica shards, it has to use a “recent” machine for one of them, and it
decides which arbitrarily. That makes sense, if not being the behavior I
would prefer (why not always put the primary on the requested machine?).
But I can live with this.

So my problem was indeed a bad assumption on my part. Thanks again and
have a great New Year!

Steve

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2D307964-F16B-44AB-9B42-E92701F9D9D2%40parisgroup.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_GyAEftnnBQf6W8NJApxiB5T1r845r0c0EHem4Xm8BRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ha! Yeah, I don’t know how I got ‘YS’ in my head…meant ‘ES’, of course.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/C2D41EE5-DD61-4434-B4DC-3A50674B7630%40parisgroup.net.
For more options, visit https://groups.google.com/d/optout.