On 7/24/2012 10:04 PM, GX wrote:
[...] Ivan I dont understand why the default settings (5 shards and 1
replica) would make any sense, does that mean that all nodes can query
the same data from all shards and replicas?
I have only recently been working with Elastic Search for myself, so I
can sympathize with the problem of terminology.
One of the phrases you used suggests you might not have all terms
straightened out.
"does that mean that all nodes can query the same data..." seems not
exactly on the mark.
A node is one OS running Elastic Search.
Nodes are organized into clusters.
An Elastic Search Index is made of a set of shards and replicas of each
shard running on a cluster of nodes.
When you create an ES index, the shards will be distributed around to
the nodes in the cluster.
Any replica shards also will be distributed around the cluster.
Any one shard is actually a separate Lucene index with it's own terms,
documents, frequency information etc.
Each shard only contains some of documents in an index. Typically the
documents are balanced between all the different shards.
Which node have which shards, where the replicas are and which shard
will get a document are all controllable within ES.
When you index a document, it ends up being routed to ONLY_ONE shard
in the index and copied to that shards replicas.
When you search an ES index by sending a query to a node, the query is
sent to all shards in the index.
Therefore nodes don't really "query the same same data", but if you ask
one node it will consolidate all results from all shards in the same
index including its own shard.
Sure a nerdy technical distinction, but I think it is worth mentioning.
If someone who has been at this longer sees any flaws in my attempt to
describe the terms please jump in.
I was bitten by the distributed nature of answering a query, the very
1st time I sent a query using the Java API. I built up a minimal search
request and had asked for the 1st 10 of all documents
without any other settings. Against a Lucene index this always be the
same documents. But in ES the results can be inconsistent, because
without some sorting or scoring, the results from any shard where as
good as any other, so ES just gave them the 1st 10 it found.
That certainly helped me to understand Cluster, Node, Index, Shard and
Documents in ES.
-Paul
I did look at gateways in the past but decided to rather have a copy
of data on each node for redundancy as a self backup system.
I am also looking into multicast, someone suggested it may not be
enabled on the network.
When using bigdesk it shows the cluster has only 1 node.
Regards
GX
On Tuesday, July 24, 2012 11:55:27 PM UTC+3, Ivan Brusic wrote:
One major item of information that you are missing is the number of
shards and replicas for your index. It is very likely that you do not
have a split brain scenario, but just that your index is simply
divided unevenly between your nodes. With the default settings of 5
shards and 1 replica, you will have 10 total shards divided among 4
nodes.
Regarding the data path: are all nodes pointing to the exact same
mount point? That would cause an error. Each node should have their
own unique data path or use the shared FS gateway:
http://www.elasticsearch.org/guide/reference/modules/gateway/fs.html
<http://www.elasticsearch.org/guide/reference/modules/gateway/fs.html>
--
Ivan
On Mon, Jul 23, 2012 at 9:34 AM, GX wrote:
> Hi All
>
> I been trying to get this setup working for some time but cant
seen to get
> it right, I have a problem with the terminology used and often
get mixed up
> what is needed.
>
> My configuration is as follows
>
> 4 servers running 2 websites (beta and live version of same
site) clustered
> for high availability and load balance.
>
> My elasticsearch setup is as follows
> a network mapped drive (nfs) maps to /my_cluster
> in there is elasticsearch
> /my_cluster/apps/elasticseach
> I could never get es to run with data on a mapped drive so I
have the
> following settings (these are the only non default):
>
> path.data: /mnt/sdb1/data/
> path.logs: /my_cluster/logs/elasticsearch
> node.master: true
> node.data: true
>
> however with this configuration when indexing documents not all
nodes get
> all the data, I was informed the is called 'split brain' and was
suggested
> to use 'minimum_master_nodes' which I set to 3, this worked but
after doing
> some batch indexing one of the nodes kept timing out and would
not restart.
>
> What is the correct configuration to have for this setup?
> Since all nodes are running from the same directory if the
elasticsearch.yml
> is not identical for each node where/how do I specify which
config file to
> use?
>
> After much development and preparation to implement ES Im
disappointed with
> this hurdle and have lost some confidence of data integrity,
restarting a
> node wiped all data from other nodes in one of my tests, I know
this is due
> to misconfiguration but is a major concern.
>
> Regards
>
> GX
>