Can I force to index on one specific node of a cluster

Hi,

 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - the 

front node

       2-4: t1.small (node.data: true, http.enabled:false)      - the 

worker to serve the queries

       5:    *t1.medium*(node.data.true, http:enabled:true)    - a more 

powerful instance to handling index updating

 My indices do not need to be updated in realtime. I need to update 

indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.

 Another benefit of that is I only need to start node #5 when I need to 

update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.

 Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei

--

I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.

To answer your question, yes, you can use the shard allocation APIs:
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

You'll also likely want to look at aliasing in order to swap in a new index:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:

Hi,

 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - the 

front node

       2-4: t1.small (node.data: true, http.enabled:false)      - the 

worker to serve the queries

       5:    *t1.medium*(node.data.true, http:enabled:true)    - a 

more powerful instance to handling index updating

 My indices do not need to be updated in realtime. I need to update 

indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.

 Another benefit of that is I only need to start node #5 when I need 

to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.

 Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei

--

Thanks a lot. That should work for me. With the hints you gave, I also find
this cluster-wide setting could be useful for my case:

http://www.elasticsearch.org/guide/reference/modules/cluster.html

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'

That way, I can temporarily exclude a node (probably without changing the
index name), and add it back after done with the indexing.

Chunlei

On Sunday, January 27, 2013 8:34:23 PM UTC-8, ppearcy wrote:

I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.

To answer your question, yes, you can use the shard allocation APIs:
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

You'll also likely want to look at aliasing in order to swap in a new
index:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:

Hi,

 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - 

the front node

       2-4: t1.small (node.data: true, http.enabled:false)      - the 

worker to serve the queries

       5:    *t1.medium*(node.data.true, http:enabled:true)    - a 

more powerful instance to handling index updating

 My indices do not need to be updated in realtime. I need to update 

indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.

 Another benefit of that is I only need to start node #5 when I need 

to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.

 Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.