Huge Data Transfer between Elasticsearch Nodes

Hi Team,

We are running a 4 node cluster as below.

Data Node 1 (hot data, master,ingest)

Data Node 2 (hot data, master,ingest)

Data Node 3 (warm data, voting-only master)

Data Node 4 (warm data)

We are seeing huge data transfer between Data Node 1 & Data Node 2 , current total data size is only 400GB but the data transfer between Data Node 1 & Data Node 2 is more than 4 TB .

is there any thing we can set so that data transfer between these two nodes can be minimized. Please help

Is it causing issues?

Hi Mark,
There is no issue , cluster is working perfectly.

Currently its hosted in AWS Public cloud and Data Node 1 (us-east-1a) and Data Node 2 (us-east-1b) are hosted in two Availability Zone. There is data transfer charge applicable between AZ's even though its in the same region. We would like to minimize this.

Assuming you have a replica configured all indexed or updated data will need to be replicated between the nodes. Some data might also need to be forwarded from one node to the other to get to the primary shard. If you are indexing a lot there will be a lot of traffic between the nodes. I can’t think of any way to reduce this.

Thank you Christian. Yes we have replica configured as 1 for all index and i also believe this data transfer is expected behavior.

One query , Both of them are Ingest nodes , If i make one node as ingest node : false , will that help in reducing the traffic except the replica.

Since my total data including replica is less than 400GB , assuming its the other node to node communication which is taking nearly 3.5 TB data

I do not think making only one node ingest node will reduce traffic. It may actually increase it. What does your data and workload look like? Do you update documents? Do you use nested mappings?

Hi Christian,

Sorry , i am new to elasticsearch so forgive me if this is not the answer you are looking .

As of now we use only auditbeat and winlogbeat to push the audit related events , also we have disabled and process and socket events and dropping few unwanted events from beats itself.

Filebeat cisco module is also used cisco asa related events.

If you are using standard modules there should be no updates or nested mappings as far as I know.

Thanks Christian. We will monitor the data flow and if its growing very high the only option we have is to have both nodes on the same availability zone.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.