Can I force to index on one specific node of a cluster

Chunlei_Wu · January 26, 2013, 5:46pm

Hi,

 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - the

front node

       2-4: t1.small (node.data: true, http.enabled:false)      - the

worker to serve the queries

       5:    *t1.medium*(node.data.true, http:enabled:true)    - a more

powerful instance to handling index updating

 My indices do not need to be updated in realtime. I need to update

indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.

 Another benefit of that is I only need to start node #5 when I need to

update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.

 Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei

--

ppearcy · January 28, 2013, 4:34am

I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.

To answer your question, yes, you can use the shard allocation APIs:

You'll also likely want to look at aliasing in order to swap in a new index:

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:

Hi,
 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - the 
front node
       2-4: t1.small (node.data: true, http.enabled:false)      - the 
worker to serve the queries
       5:    *t1.medium*(node.data.true, http:enabled:true)    - a 
more powerful instance to handling index updating
 My indices do not need to be updated in realtime. I need to update 
indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.
 Another benefit of that is I only need to start node #5 when I need 
to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.
 Any thoughts? Or maybe a different cluster setting?
Thanks,

Chunlei

--

Chunlei_Wu · January 28, 2013, 8:25pm

Thanks a lot. That should work for me. With the hints you gave, I also find
this cluster-wide setting could be useful for my case:

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'

That way, I can temporarily exclude a node (probably without changing the
index name), and add it back after done with the indexing.

Chunlei

On Sunday, January 27, 2013 8:34:23 PM UTC-8, ppearcy wrote:

I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.

To answer your question, yes, you can use the shard allocation APIs:
Elasticsearch Platform — Find real-time answers at scale | Elastic

You'll also likely want to look at aliasing in order to swap in a new
index:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:
Hi,
 I plan to setup ES cluster on EC2 like this:

       1:    t1.small (node.data: false, http.enabled:true)      - 
the front node
       2-4: t1.small (node.data: true, http.enabled:false)      - the 
worker to serve the queries
       5:    *t1.medium*(node.data.true, http:enabled:true)    - a 
more powerful instance to handling index updating
 My indices do not need to be updated in realtime. I need to update 
indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.
 Another benefit of that is I only need to start node #5 when I need 
to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.
 Any thoughts? Or maybe a different cluster setting?
Thanks,

Chunlei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.