Hi Everyone,
In my deployment, I have 3 node elasticsearch cluster in kubernetes environment and 5 clients are writing data to this cluster at rate of ~7MB/s per elasticsearch data node and NO READ operation is performed. Below are the few configurations Elasticsearch configuration details. Two mount points are configured in two different disks for storing the data, also number of shards is 6 and there is no replication. What we have observed that there is ~10MB/s data transfer is happening between the elasticsearch data nodes. So, could someone please explain about this internode data transfer and is there a way to reduce it. Thanks.
iftop -B output:
metrics-datastore-0.metrics-datastore.a10harmony.svc.cluster.local => metrics-datastore-1.metrics-datastore.a10harmony.svc.cluster.local 12.2MB 15.4MB 14.4MB
<= 11.3MB 14.1MB 14.1MB
metrics-datastore-0.metrics-datastore.a10harmony.svc.cluster.local => metrics-datastore-2.metrics-datastore.a10harmony.svc.cluster.local 21.4MB 15.2MB 14.4MB
<= 21.9MB 13.9MB 13.9MB
ES Version details:
bash-4.1# curl -XGET 'localhost:9200'
{
"name" : "metrics-datastore-0",
"cluster_name" : "metrics-datastore",
"cluster_uuid" : "cNKU0p_2TVGzN350m3KXKw",
"version" : {
"number" : "6.6.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "a9861f4",
"build_date" : "2019-01-24T11:27:09.439740Z",
"build_snapshot" : false,
"lucene_version" : "7.6.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
ES settings:
index.number_of_replicas": 0,
index.number_of_shards: 6,
index.refresh_interval: 30s,
index.translog.durability: async,
index.translog.flush_threshold_size: 1g,
index.translog.sync_interval: 10s,
index.unassigned.node_left.delayed_timeout: 10m,
index.mapping.total_fields.limit: 3000
path.data: /data/elasticsearch/data,/data2/elasticsearch/data
path.logs: /logs/elasticsearch,/logs2/elasticsearch
discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_retries: 5
discovery.zen.fd.ping_timeout: 120s
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_data_nodes: 3
gateway.expected_master_nodes: 3
gateway.expected_nodes: 6
http.cors.enabled: false
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
thread_pool.write.queue_size: 2000
network.bind_host: , local
network.publish_host:
logger.gateway: TRACE