Elastic search Cluster indexing

Hitesh_Chavhan · May 20, 2016, 4:50am

hi,

I have a two node ES cluster.
I need to index the data on the cluster. Do I need to provide both the ips in my indexing script.
I am doing this using python.

Thanks.

magnusbaeck · May 20, 2016, 5:34am

If you want to be fault-tolerant if one of the servers goes down you need to point to both servers but otherwise it's enough with one of them.

(The ES cluster itself obviously also needs to be resilient if one node goes down.)

geekpete · May 20, 2016, 5:58am

Hi Hitesh,

Your indexing script will work as long as it can connect to any available "client" node, that is any node with client mode enabled that routes queries to data nodes. So connecting to just one node would work fine.

I'd guess that your two nodes are set up in default mode that enables all modes - client/master/data.

Some people balance connections with dns round robin across all of their client nodes as well. So a single address that resolves to both of the elasticsearch nodes. This way your indexing script will connect to one node when it starts up and does the dns resolution. For high availability, some people set up a health check to detect when a node is no longer available and take it out of dns.

You could also spray indexing traffic at both of these nodes to balance the workload between them.
You can either do this with by fronting them with a load balancer (like an ELB/HAProxy) or if you're tricky enough you could have your script set up a separate set of threads for each node in your script config to balance indexing work to both of them.

You could run a copy of the index script for each node as well, if your use case allows for parallel indexing.

Let me know if this answers your question for you or if you want more info.

Thanks.

Hitesh_Chavhan · May 20, 2016, 6:48am

In this case if I index data only to one IP, and that IP node goes down will elastic be able to fetch data from the other node.

geekpete · May 20, 2016, 7:03am

If you're connecting to both servers in your script, your script should attempt retry or reconnection when indexing fails on a node that goes down, if designed right the separate threads that are connected to the other node should continue indexing.

With a load balancer or round robin dns, this should be handled to some extent with a health check probe of some type. With dns, your script will need to re-resolve on failure/reconnection to get to new ip to connect to.

Hitesh_Chavhan · May 20, 2016, 7:55am

Yes that helped.

So in my use case, I am indexing the data to single ip, and while fetching the data from API, I will also provide this same IP. Will the ES fetch data from both the nodes while quering or will it bring the data from same one ip that I provide in the API.

I want to provide only one IP at both the time while indexing and also while retrival, Is it a good practise and what are the consequences of using such approach.

geekpete · May 20, 2016, 12:56pm

When querying, any node acting as a Client node will route queries to the data node that holds the particular shards with the documents that need to be retrieved or indexed to. Depending on what the query is and where the shards are allocated this can be one more more nodes. (eg, a match all query would request all docs, so would hit both nodes if the shards of an index is spread across both nodes).

Since you will have two nodes set up with defaults (client+data+master mode) it should work fine if you use either node for query/indexing.

When you need to scale your cluster, you have the option to use dedicated client nodes that only perform query/index routing and don't hold data and don't act as master eligible. You might also consider dedicated master nodes so that they don't have to worry about storing data and can be dedicated to managing cluster state.

You have the flexibility to scale as you need to fit your use case, but there are some known patterns that work quite well for scaling up.

In short, you should be fine to query either server. You can also look at load balancing methods for spreading traffic evenly and for high availability.

Topic		Replies	Views
Indexer cluster separate and Searcher cluster separate Elasticsearch	9	856	May 11, 2019
Implementing ES index creation and search in parallel on the same node Elasticsearch	5	1111	May 26, 2017
ES two node cluster doubt Elasticsearch	6	407	June 14, 2019
Run ES on two servers and store the index in NAS Elasticsearch	6	2574	May 22, 2018
Understand Cluster HA functionality Elasticsearch	3	2869	July 5, 2017

Elastic search Cluster indexing

Related topics