Hi there,
I have an ES cluster which stops ingesting data every time a node is down.
As it is a cluster used for debugging purposes only, it retains logs has no replication for most of its shards. There are 13 data nodes (i3.3xlarge) and 3 dedicated masters (m4.large).
Also, it seems that ingestion rate decreases everytime shards get reallocated, as well as when a certain node faces high cpu usage.
Any idea what may be happening?
Thanks!
Summary
[root@ip-10-0-0-212 elasticsearch]# curl localhost:9200
{
"name" : "tfg-es-logs-cluster-node-x",
"cluster_name" : "my-cluster",
"cluster_uuid" : my-uid",
"version" : {
"number" : "5.6.5",
"build_hash" : "6a37571",
"build_date" : "2017-12-04T07:50:10.466Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
Cluster health:
[ec2-user@ip-10-x-x-x ~]$ curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "my-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 16,
"number_of_data_nodes" : 13,
"active_primary_shards" : 8135,
"active_shards" : 8358,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
SO:
Amazon Linux
[ec2-user@ip-10-0-0-212 ~]$ uname -a
Linux ip-10-0-0-212 4.9.77-31.58.amzn1.x86_64 #1 SMP Thu Jan 18 22:15:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Elasticsearch.yml
cluster.name: my-clyster
node.name: my-cluster-my-node
path.conf: "/etc/elasticsearch"
path.data: "/mnt/es_disk"
path.logs: "/var/log/elasticsearch"
network.host: 0.0.0.0
discovery.type: ec2
discovery.ec2.groups: sg-my-sg
discovery.ec2.any_group: false
cloud.node.auto_attributes: true
bootstrap.memory_lock: true
script.inline: true
script.stored: true
xpack.security.enabled: false
xpack.monitoring.enabled: false
node.data: true
node.master: false
node.ingest: true
node.attr.index_type: log