I am hoping to utilize Elasticsearch for our search needs, but I had a few
questions about a particular infrastructure issue that I wasn’t able to get
resolved by looking through the various Elasticsearch documentation and
resources.
The issue is that we have both a datacenter and an internal corporate
network. Let’s say that the Elasticsearch nodes are as follows:
Internal LAN: Node 1
Datacenter: Node 2 & Node 3
We wish that it work in such a way that search queries that are executed on
the LAN only go to Node 1 (these would be search requests from within the
company). However, queries executed from web servers in the datacenter
should only go to Nodes 2 and 3. Node 1 will be able to communicated to the
other 2 via a VPN, but we wish to minimize traffic on it (for obvious
reasons); this means that we don’t want a situation where a large amount of
web requests trigger many queries on the node inside the LAN (Node 1).
Is it possible to restrict queries to servers in some way that would
accommodate our needs? We still want the nodes to communicate with each
other as needed for Elasticsearch to function properly, and be highly
available by way of replicas, but the idea is to minimize heavy traffic
over bottlenecks (namely the VPN in this case).
I apologize if this question was already asked before, but I wasn't able to
find it. Thank you in advance! =)
The only solution that comes to mind is to make sure that all nodes have
all shards (by setting number_of_replicas to 2), execute all queries with
preferencehttp://www.elasticsearch.org/guide/reference/api/search/preference.html set
to "_local", and make sure that internal users only connect to Node 1 and
external users are load balanced between Node 2 and Node 3. Unfortunately,
this solution will not scale if you will ever have more data than you can
fit into a single node.
On Wednesday, November 7, 2012 6:47:38 PM UTC-5, Matt wrote:
Hi Everyone,
I am hoping to utilize Elasticsearch for our search needs, but I had a few
questions about a particular infrastructure issue that I wasn’t able to get
resolved by looking through the various Elasticsearch documentation and
resources.
The issue is that we have both a datacenter and an internal corporate
network. Let’s say that the Elasticsearch nodes are as follows:
Internal LAN: Node 1
Datacenter: Node 2 & Node 3
We wish that it work in such a way that search queries that are executed
on the LAN only go to Node 1 (these would be search requests from within
the company). However, queries executed from web servers in the datacenter
should only go to Nodes 2 and 3. Node 1 will be able to communicated to the
other 2 via a VPN, but we wish to minimize traffic on it (for obvious
reasons); this means that we don’t want a situation where a large amount of
web requests trigger many queries on the node inside the LAN (Node 1).
Is it possible to restrict queries to servers in some way that would
accommodate our needs? We still want the nodes to communicate with each
other as needed for Elasticsearch to function properly, and be highly
available by way of replicas, but the idea is to minimize heavy traffic
over bottlenecks (namely the VPN in this case).
I apologize if this question was already asked before, but I wasn't able
to find it. Thank you in advance! =)
Thanks! We'll try this on some development servers and see how it works.
-- Matt
On Wednesday, November 7, 2012 9:38:59 PM UTC-6, Igor Motov wrote:
The only solution that comes to mind is to make sure that all nodes have
all shards (by setting number_of_replicas to 2), execute all queries with
preferencehttp://www.elasticsearch.org/guide/reference/api/search/preference.html set
to "_local", and make sure that internal users only connect to Node 1 and
external users are load balanced between Node 2 and Node 3. Unfortunately,
this solution will not scale if you will ever have more data than you can
fit into a single node.
On Wednesday, November 7, 2012 6:47:38 PM UTC-5, Matt wrote:
Hi Everyone,
I am hoping to utilize Elasticsearch for our search needs, but I had a
few questions about a particular infrastructure issue that I wasn’t able to
get resolved by looking through the various Elasticsearch documentation and
resources.
The issue is that we have both a datacenter and an internal corporate
network. Let’s say that the Elasticsearch nodes are as follows:
Internal LAN: Node 1
Datacenter: Node 2 & Node 3
We wish that it work in such a way that search queries that are executed
on the LAN only go to Node 1 (these would be search requests from within
the company). However, queries executed from web servers in the datacenter
should only go to Nodes 2 and 3. Node 1 will be able to communicated to the
other 2 via a VPN, but we wish to minimize traffic on it (for obvious
reasons); this means that we don’t want a situation where a large amount of
web requests trigger many queries on the node inside the LAN (Node 1).
Is it possible to restrict queries to servers in some way that would
accommodate our needs? We still want the nodes to communicate with each
other as needed for Elasticsearch to function properly, and be highly
available by way of replicas, but the idea is to minimize heavy traffic
over bottlenecks (namely the VPN in this case).
I apologize if this question was already asked before, but I wasn't able
to find it. Thank you in advance! =)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.