Dear Christian,
In our production cluster, some of the Indices looks abnormal, as if first it is showing GREEN state, then after sometimes it is showing RED state. How it is possible,
ex. green open imiconnect_inf_mo-2020-02-19 mp0-ITIHSg-1fbcP8aSzIQ 1 1 433229 0 455mb 229.1mb
red open imiconnect_inf_mo-2020-02-19 mp0-ITIHSg-1fbcP8aSzIQ 1 1
If I delete that index, then some indices went to RED state from GREEN.
I am fully confusing about this, could please help us.
Please do not ping people not already involved in the thread. This forum is manned by volunteers.
I would guess that you have a problem with either cluster configuration and/or the underlying hardware/storage.
How large is you cluster? What type of hardware is it deployed on? What type of storage are you using? Which Elasticsearch version are you using? How are the nodes configured? Are there any error messages or clues in the logs?
Which Elasticsearch version are you using?
ES Version : 5.6.4
How are the nodes configured?
ES Cluster have 2 Coord, 3 Master, 2 Datanodes
Are there any error messages or clues in the logs?
org.elasticsearch.transport.RemoteTransportException: [es-master-1][10.0.123.137:9300][indices:admin/delete]
Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[imiconnect_inf_mo-2020-02-19/mp0-ITIHSg-1fbcP8aSzIQ]]) within 30s
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.4.jar:5.6.4]
What does the Elasticsearch.yml file for the master nodes look like? How far apart are the two data centres? What is the latency between them? What type of hardware and storage is used?
There is no latency between them , -sh-4.1$ ping 192.168.67.24
PING 192.168.67.24 (192.168.67.24) 56(84) bytes of data.
64 bytes from 192.168.67.24: icmp_seq=1 ttl=61 time=2.44 ms
64 bytes from 192.168.67.24: icmp_seq=2 ttl=61 time=2.45 ms
Hardware:
-sh-4.1$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 37
Model name: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
Stepping: 1
CPU MHz: 2700.000
BogoMIPS: 5400.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0,1
Master node logs::
[2020-03-20T07:58:26,662][WARN ][o.e.c.a.s.ShardStateAction] [es-master-1] [imiconnect_voice_trans-2020-03-20][0] received shard failed for shard id [[imiconnect_voice_trans-2020-03-20][0]], allocation id [Z9xZfm09R1SMND0vui3EIw], primary term [1], message [mark copy as stale]
Data node logs:
Caused by: java.lang.IllegalStateException: try to recover [imiconnect_inf_mt-2020-03-19][0] from primary shard with sync id but number of docs differ: 508203 (es-data-2, primary) vs 508181(es-data-1)
[2020-03-20T08:42:28,627][WARN ][o.e.c.s.ClusterService ] [es-master-1] cluster state update task [put-mapping[ump_cs_agg_hr-2020-03-20-08]] took [36.2s] above the warn threshold of 30s
[2020-03-20T08:42:41,842][INFO ][o.e.m.j.JvmGcMonitorService] [es-master-1] [gc][94927] overhead, spent [350ms] collecting in the last [1s]
[2020-03-20T08:43:12,998][INFO ][o.e.c.m.MetaDataMappingService] [es-master-1] [ump_notifications_agg_hr/x-gGZJPoRouuSRAVKj6cUg] create_mapping [ump_notifications_agg_hr-2020-03-20-08]
[2020-03-20T08:43:13,757][WARN ][o.e.c.s.ClusterService ] [es-master-1] cluster state update task [put-mapping[ump_notifications_agg_hr-2020-03-20-08]] took [33.7s] above the warn threshold of 30s
I was deleted index (imiconnect_apnp_trans_log-2020-02-26), because this index was in RED state.
Then this index (imiconnect_chat_messages_logs-2020-01-30) comes to RED state, earlier this was in GREEN state. What happening can you please tell us?
This is very urgent, production issue , please help me to fix this.
Kindly check this logs.
[2020-03-20T08:55:38,463][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-master-1] [imiconnect_apnp_trans_log-2020-02-26/CVNriLFrRt6jEe7V7YPOVQ] deleting index
[2020-03-20T08:55:44,211][INFO ][o.e.g.LocalAllocateDangledIndices] [es-master-1] auto importing dangled indices [[imiconnect_chat_messages_logs-2020-01-30/N4vcNm-mQy6ZWaBjxPfGvg]/OPEN] from [{es-coord-1}{ShbcqaRsSLKbJTNPybfRXA}{Qou2A3JbT-KcuxAHo8_v6Q}{10.0.123.136}{10.0.123.136:9300}]
As I stated earlier this forum is manned by volunteers. Do not ping people not already involved in the thread. This also means there are no SLAs and not even a guarantee of response.
I would recommend the following:
Make sure your VMs are not overprovisioned so Elasticsearch has access to the cores allocated
Make sure your VMs do not use memory ballooning so Elasticsearch actually has access to the RAM it thinks it has
your cluster state updates seem to be slow, which could be caused by the above factors. You also have more shards than recommended, so I would recommend reducing this
If i delete this (index-2020-03-17), then immediately this (index-2020-03-01) is automatically creating with RED state
Few indices are automatically creating with RED state which is not present in cluster earlier.
LOGS :
[2020-03-20T13:38:49,613][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-master-1] [imiconnect_chat_messages_logs-2020-01-29/B7vyDb0zSiWbnB9ZrXwQGg] deleting index
[2020-03-20T13:38:50,662][INFO ][o.e.g.LocalAllocateDangledIndices] [es-master-1] auto
importing dangled indices [[imiconnect_chat_messages_logs-2020-02-02/2WTBuIS6RNWY3tTujzoTtA]/OPEN] from [{es-coord-1}{ShbcqaRsSLKbJTNPybfRXA}{Qou2A3JbT-KcuxAHo8_v6Q}{10.0.123.136}{10.0.123.136:9300}]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.