We are seeing huge data transfer between Data Node 1 & Data Node 2 , current total data size is only 400GB but the data transfer between Data Node 1 & Data Node 2 is more than 4 TB .
is there any thing we can set so that data transfer between these two nodes can be minimized. Please help
Hi Mark,
There is no issue , cluster is working perfectly.
Currently its hosted in AWS Public cloud and Data Node 1 (us-east-1a) and Data Node 2 (us-east-1b) are hosted in two Availability Zone. There is data transfer charge applicable between AZ's even though its in the same region. We would like to minimize this.
Assuming you have a replica configured all indexed or updated data will need to be replicated between the nodes. Some data might also need to be forwarded from one node to the other to get to the primary shard. If you are indexing a lot there will be a lot of traffic between the nodes. I can’t think of any way to reduce this.
I do not think making only one node ingest node will reduce traffic. It may actually increase it. What does your data and workload look like? Do you update documents? Do you use nested mappings?
Sorry , i am new to elasticsearch so forgive me if this is not the answer you are looking .
As of now we use only auditbeat and winlogbeat to push the audit related events , also we have disabled and process and socket events and dropping few unwanted events from beats itself.
Filebeat cisco module is also used cisco asa related events.
Thanks Christian. We will monitor the data flow and if its growing very high the only option we have is to have both nodes on the same availability zone.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.