I plan to setup ES cluster on EC2 like this:
1: t1.small (node.data: false, http.enabled:true) - the
front node
2-4: t1.small (node.data: true, http.enabled:false) - the
worker to serve the queries
5: *t1.medium*(node.data.true, http:enabled:true) - a more
powerful instance to handling index updating
My indices do not need to be updated in realtime. I need to update
indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.
Another benefit of that is I only need to start node #5 when I need to
update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.
Any thoughts? Or maybe a different cluster setting?
I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.
To answer your question, yes, you can use the shard allocation APIs:
You'll also likely want to look at aliasing in order to swap in a new index:
Best Regards,
Paul
On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:
Hi,
I plan to setup ES cluster on EC2 like this:
1: t1.small (node.data: false, http.enabled:true) - the
front node
2-4: t1.small (node.data: true, http.enabled:false) - the
worker to serve the queries
5: *t1.medium*(node.data.true, http:enabled:true) - a
more powerful instance to handling index updating
My indices do not need to be updated in realtime. I need to update
indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.
Another benefit of that is I only need to start node #5 when I need
to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.
Any thoughts? Or maybe a different cluster setting?
That way, I can temporarily exclude a node (probably without changing the
index name), and add it back after done with the indexing.
Chunlei
On Sunday, January 27, 2013 8:34:23 PM UTC-8, ppearcy wrote:
I prefer to have each node equivalent and spread the search/index load
equally, granted I'm running on physical h/w. There could be benefit in
having an offline rebuild node, though.
On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:
Hi,
I plan to setup ES cluster on EC2 like this:
1: t1.small (node.data: false, http.enabled:true) -
the front node
2-4: t1.small (node.data: true, http.enabled:false) - the
worker to serve the queries
5: *t1.medium*(node.data.true, http:enabled:true) - a
more powerful instance to handling index updating
My indices do not need to be updated in realtime. I need to update
indices regularly (say once a week, with a batch of changes). Ideally, I
hope I can force the re-indexing happens only on node #5 (without syncing
with other nodes), then I do some validation tests on the updated indices.
If everything is OK, I can have the updated indices replicated to 2-4
worker nodes. One thing to note is that while I am doing re-indexing on
node #5, node 1-4 should be always live to serve queries against old
indices.
Another benefit of that is I only need to start node #5 when I need
to update indices. I can just shut it down when it's not used to save the
cost. For serving queries, my other small instance nodes are sufficient
enough.
Any thoughts? Or maybe a different cluster setting?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.