I can use some help verifying/understanding how persistence works. Here is
my understanding of how it works:
Regardless of whether the index is stored in memory or file system, it is
considered temporary and removed when the node is stopped, hence if all the
nodes in the cluster stop, indices would be lost.
As such, for persistence write behind gateway needs to be used. Gateway
keeps a transaction log and (periodically?) creates indices. If all the
nodes in the cluster were stopped and restarting, the indices and the
transaction logs created by the gateway are used to recreate node indices.
Is this right?
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
On Fri, Mar 26, 2010 at 1:10 PM, Tim Robertson timrobertson100@gmail.comwrote:
Thanks Shay,
So... reading between the lines, does it then use protobufs (or other?) for
RPC instead of JSON and serializing and deserializing?Cheers
TimOn Fri, Mar 26, 2010 at 3:58 PM, Shay Banon shay.banon@elasticsearch.comwrote:
Hi,
You won't enjoy locally between elasticsearch and hadoop in any case
since both use different distribution model. The locality would only make
sense for the indexing part, and think that you probably won't really need
it (it should be fast enough).What language are you going to write your jobs at? If Java, then make
use of the native Java client (obtained from a "non data" Server started)
and not HTTP. More here:
http://www.elasticsearch.com/docs/elasticsearch/java_api/client/#Server_Client-shay.banon
On Fri, Mar 26, 2010 at 5:35 PM, Tim Robertson <timrobertson100@gmail.com
wrote:
Hey,
Is anyone building their indexes using Hadoop? If so, are they deploying
ES across the same cluster as Hadoop and trying to reduce network noise by
making use of data locality, or keeping the clusters separate and just
calling over HTTP from MapReduce when building the indexes? I am about to
set up on EC2, and planned to keep the search and processing machines
separate.Cheers,
Tim