Huge Data Transfer between Elasticsearch Nodes

ajesh · September 17, 2020, 7:56am

Hi Team,

We are running a 4 node cluster as below.

Data Node 1 (hot data, master,ingest)

Data Node 2 (hot data, master,ingest)

Data Node 3 (warm data, voting-only master)

Data Node 4 (warm data)

We are seeing huge data transfer between Data Node 1 & Data Node 2 , current total data size is only 400GB but the data transfer between Data Node 1 & Data Node 2 is more than 4 TB .

is there any thing we can set so that data transfer between these two nodes can be minimized. Please help

warkolm · September 17, 2020, 10:23pm

Is it causing issues?

ajesh · September 18, 2020, 10:22am

Hi Mark,
There is no issue , cluster is working perfectly.

Currently its hosted in AWS Public cloud and Data Node 1 (us-east-1a) and Data Node 2 (us-east-1b) are hosted in two Availability Zone. There is data transfer charge applicable between AZ's even though its in the same region. We would like to minimize this.

Christian_Dahlqvist · September 18, 2020, 10:28am

Assuming you have a replica configured all indexed or updated data will need to be replicated between the nodes. Some data might also need to be forwarded from one node to the other to get to the primary shard. If you are indexing a lot there will be a lot of traffic between the nodes. I can’t think of any way to reduce this.

ajesh · September 18, 2020, 11:23am

Thank you Christian. Yes we have replica configured as 1 for all index and i also believe this data transfer is expected behavior.

One query , Both of them are Ingest nodes , If i make one node as ingest node : false , will that help in reducing the traffic except the replica.

Since my total data including replica is less than 400GB , assuming its the other node to node communication which is taking nearly 3.5 TB data

Christian_Dahlqvist · September 18, 2020, 12:21pm

I do not think making only one node ingest node will reduce traffic. It may actually increase it. What does your data and workload look like? Do you update documents? Do you use nested mappings?

ajesh · September 18, 2020, 1:29pm

Hi Christian,

Sorry , i am new to elasticsearch so forgive me if this is not the answer you are looking .

As of now we use only auditbeat and winlogbeat to push the audit related events , also we have disabled and process and socket events and dropping few unwanted events from beats itself.

Filebeat cisco module is also used cisco asa related events.

Christian_Dahlqvist · September 19, 2020, 8:20am

If you are using standard modules there should be no updates or nested mappings as far as I know.

ajesh · September 23, 2020, 9:01am

Thanks Christian. We will monitor the data flow and if its growing very high the only option we have is to have both nodes on the same availability zone.

system · October 21, 2020, 9:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data transfer between Elasticsearch Data Nodes Elasticsearch	1	737	April 15, 2020
How to minimize the amount of data transfered between es nodes Elasticsearch	2	330	July 6, 2017
Best way to share data between two Elasticsearch nodes located in separate data centers Elasticsearch	4	1993	February 16, 2017
How to move data in a ingest node to data node Elasticsearch	2	323	March 22, 2021
Should I use ElasticSearch across two datacenters for convenient data replication? Elasticsearch	2	522	July 6, 2017

Huge Data Transfer between Elasticsearch Nodes

Related topics