Elasticsearch stops ingesting data when node is down


(Igor Leão) #1

Hi there,

I have an ES cluster which stops ingesting data every time a node is down.
As it is a cluster used for debugging purposes only, it retains logs has no replication for most of its shards. There are 13 data nodes (i3.3xlarge) and 3 dedicated masters (m4.large).

Also, it seems that ingestion rate decreases everytime shards get reallocated, as well as when a certain node faces high cpu usage.

Any idea what may be happening?

Thanks!

Summary
[root@ip-10-0-0-212 elasticsearch]# curl localhost:9200
{
"name" : "tfg-es-logs-cluster-node-x",
"cluster_name" : "my-cluster",
"cluster_uuid" : my-uid",
"version" : {
"number" : "5.6.5",
"build_hash" : "6a37571",
"build_date" : "2017-12-04T07:50:10.466Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}

Cluster health:
[ec2-user@ip-10-x-x-x ~]$ curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "my-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 16,
"number_of_data_nodes" : 13,
"active_primary_shards" : 8135,
"active_shards" : 8358,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

SO:
Amazon Linux
[ec2-user@ip-10-0-0-212 ~]$ uname -a
Linux ip-10-0-0-212 4.9.77-31.58.amzn1.x86_64 #1 SMP Thu Jan 18 22:15:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Elasticsearch.yml
cluster.name: my-clyster
node.name: my-cluster-my-node
path.conf: "/etc/elasticsearch"
path.data: "/mnt/es_disk"
path.logs: "/var/log/elasticsearch"
network.host: 0.0.0.0
discovery.type: ec2
discovery.ec2.groups: sg-my-sg
discovery.ec2.any_group: false
cloud.node.auto_attributes: true
bootstrap.memory_lock: true
script.inline: true
script.stored: true
xpack.security.enabled: false
xpack.monitoring.enabled: false
node.data: true
node.master: false
node.ingest: true
node.attr.index_type: log


(Christian Dahlqvist) #2

If you do not have a replica configured for an index, the index will go red if you lose a shard, which means you can not index into it. If you did have a replica, Elasticsearch would promote the replica to primary if you lose one, and you can continue indexing into it as there is at least one copy of each shard.


(Igor Leão) #3

Thanks @Christian_Dahlqvist .

Let's imagine node X is down for a few minutes. Node X contains primary shards px1, px2 and px3. We also have node Y with primary shards py1, py2 and py3. If Node X is down, should this affect ingestion of py1, py2 and py3? (considering both scenarios where px_i and py_i belong to the same and different indices)


(swarmee.net) #4

If they belong to the same index then indexing should stop - as there will be documents being indexed that need to be allocated to primary shards that are not available.

If they belong to different indexes then happy days.

I recommend turning on compression on the indexes and replicating the data. Data lost is not fun even if the data is just logs.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.