I have 2 instances of elasticsearch in 2 different machines in different
networks (ip-A and ip-B). They have the same cluster name, but I don't want
them to replicate data between them, so I disabled all discovery
configurations by setting discovery.zen.ping.multicast.enabled to false in
both (the ideal situation would be they don't even know each other, making
them completely unaware of each other). I would like then to have another
instance of elasticsearch to function as a search load balancer, to allow
the search requests to enter through it and then search in the other 2
instances to fetch data from them and aggregate results.
Is this possible? I already tryed to configure the search load balancer
like this:
Why is it that you are wanting your data nodes to be unaware of each other? If truly your goal is to prevent replication between nodes, setting number_of_replicas to 0 on your index settings should inherently achieve this.
And if you have multiple shards, such as having 1 shard on each of your 2 data nodes, communication will need to be open for transport between nodes (which is unrelated to multicast discovery -- discovery is only used during master election). If the data to be returned in a search response is located on the node opposite of where the load balancer sent the search request, the receiving node must request the data from the other node. That is of course, only if you do not have a replica -- which would allow each node to respond to any search request directly.
As far as configuring a load balancing (or client) node, these are the settings I use on the client nodes (which in my case are API servers):
node:
data: false
client: true
You can still set master to false, if needed in your situation of course. I just happen to use specific data nodes as master in my case.
you should decide at some time, if you want to have a cluster or not. Your
scenario sounds somewhat hybrid. Maybe it makes sense to have two different
elasticsearch clusters (with two different cluster names) in two different
subnets, which yield the same data and simply put an nginx in front of it
in order to do round robin loadbalancing. Anyway I still do not see a point
of having completely independent clusters around and then again kind of
syncing them at application level - but maybe that's a weird requirement.
Anyway, you have to go HTTP instead of using an elasticsearch node and its
transport protocol for load balancing between two unknown clusters.
I have 2 instances of elasticsearch in 2 different machines in different
networks (ip-A and ip-B). They have the same cluster name, but I don't want
them to replicate data between them, so I disabled all discovery
configurations by setting discovery.zen.ping.multicast.enabled to false in
both (the ideal situation would be they don't even know each other, making
them completely unaware of each other). I would like then to have another
instance of elasticsearch to function as a search load balancer, to allow
the search requests to enter through it and then search in the other 2
instances to fetch data from them and aggregate results.
Is this possible? I already tryed to configure the search load balancer
like this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.