Hi James,
you can choose a setup within a single cluster, where the nodes (the
cluster members) serve different purposes. No need for a second cluster.
ES nodes can be started in a data-only mode, without HTTP server, so they
never process client requests, but only do the heavy lifting.
Proxy nodes can be started without data, but with HTTP, so they only
process client requests and forward them to the data nodes involved in the
queries.
You can start as many proxy nodes and data nodes as you want, so you scale
the nodes in two aspects.
In my view, if you separate proxy and data nodes into two clusters, there
are much hassles. Nodes can not talk to each other over cluster boundaries.
You would have to store your data twice by doing it with your client tool
alone (while ES can do it for you a lot easier by using replica levels),
and afterwards, you would have to keep the data in sync when nodes fail
(what is tedious when doing it with external client tools, while ES is
doing it for you automatically by replicated shards and allocation control).
Cheers,
Jörg
On Friday, November 9, 2012 3:15:39 AM UTC+1, James Boehmer wrote:
It would be solely for querying. For example, we'd like to have a cluster
with 5 shards/1 replica being constantly indexed and queried. Then we'd
like to have a second cluster for serving external query traffic, but would
get its data from the first cluster. The second cluster would have its own
complete set of primary/replica shards separate from the first cluster.
However, we would like it to index it passively from the first cluster
instead of having to manually index both clusters simultaneously. The
purpose of the second cluster is to be able to scale and absorb traffic
independently from the internal cluster. It's somewhat important that they
not interfere with each other, but I suppose that an entire single cluster
could scale to handle all of the traffic anyway.
On Thursday, November 8, 2012 7:53:05 PM UTC-5, Jörg Prante wrote:
Hi,
can you please elaborate what is the kind of "traffic"? Is it data load
for indexing, or search requests hitting the cluster, or both of them?
You can set up data-less ES nodes that can absorb all the network
connection load, that is very easy. You can also dedicate data-less nodes
to different ports, if you mean that by addressing internal/external
traffic.
Jörg
On Thursday, November 8, 2012 8:41:35 PM UTC+1, James Boehmer wrote:
I'm actually looking for something very similar, which I do not believe
is the same as that _source river request. I need to run two Elasticsearch
stacks separately but simultaneously, to segregate internal traffic from
external traffic. With Solr I would set up a single master, and run two
sets of slaves load balanced independently. That way the internal slaves
could never be affected by traffic hitting the external slaves, and vice
versa. But with ES is there a way to set up a handful of nodes that are
basically their own cluster, but get their data from a master cluster which
does not store the _source?
On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: [Feature Request] Add a river to ElasticSearch instance · Issue #1077 · elastic/elasticsearch · GitHub
--
David 
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 28 sept. 2012 à 02:49, es_learner da...@livefyre.com a écrit :
Is it supported?
The objective here is to 'tee' new docs into a secondary index. My
current
implementation is to write twice from the client - once to primary
index and
the other to secondary. Primary index gets pruned every month.
Secondary
is never pruned.
--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.
--
--