My ElasticSearch server works fine for a few hours or a day and suddenly stops working. It has 1 shard and 1 replica on a single node installed on VPS along with the application server and has only 1 index with 30,000 documents.
My Configuration:
- VPS: 1 CPU Core, 2GB RAM
- Ubuntu 20.10
- ElasticSearch Version: 7.8.0
- Heap Size: -Xms1g -Xmx1g
When I checked the logs it seems to me that the ElasticSearch server stops after the health check.
[2021-03-29T01:30:00,007][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] triggering scheduled [ML] maintenance tasks
[2021-03-29T01:30:00,032][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] starting SLM retention snapshot cleanup task
[2021-03-29T01:30:00,084][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] there are no repositories to fetch, SLM retention snapshot cleanup task complete
[2021-03-29T01:30:00,232][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Deleting expired data
[2021-03-29T01:30:00,611][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [node-1] Successfully deleted [0] unused stats documents
[2021-03-29T01:30:00,621][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Completed deletion of expired ML data
[2021-03-29T01:30:00,622][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask
[2021-03-29T02:38:45,814][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][60425] overhead, spent [423ms] collecting in the last [1s]
[2021-03-29T14:02:17,728][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [12258ms] which is above the warn threshold of [5s]
[2021-03-29T14:07:46,549][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5140ms] which is above the warn threshold of [5s]
[2021-03-29T14:09:17,396][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][101248] overhead, spent [553ms] collecting in the last [1.7s]
Another log
[2021-03-31T02:01:56,154][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[reports][0]]]).
[2021-03-31T06:08:19,126][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6940ms] which is above the warn threshold of [5s]
[2021-03-31T06:54:45,818][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16670][19] duration [2.7s], collections [1]/[26.8s], total [2.7s]/[4.4s], memory [694.9mb]->[90.9mb]/[1gb], all_pools {[young] [604mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:03:44,953][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7148ms] which is above the warn threshold of [5s]
[2021-03-31T07:11:03,918][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16975][20] duration [11.3s], collections [1]/[12s], total [11.3s]/[15.7s], memory [130.9mb]->[90.8mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:11:06,610][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][16975] overhead, spent [11.3s] collecting in the last [12s]
[2021-03-31T07:28:04,708][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5557ms] which is above the warn threshold of [5s]
[2021-03-31T07:30:30,545][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5035ms] which is above the warn threshold of [5s]
[2021-03-31T07:35:07,502][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5732ms] which is above the warn threshold of [5s]
[2021-03-31T07:35:12,985][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17163][21] duration [4.9s], collections [1]/[3.4s], total [4.9s]/[20.6s], memory [126.8mb]->[130.8mb]/[1gb], all_pools {[young] [36mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:35:16,582][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17163] overhead, spent [4.9s] collecting in the last [3.4s]
[2021-03-31T07:44:37,323][WARN ][o.e.h.AbstractHttpServerTransport] [node-1] handling request [null][POST][/reports/_count][Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:37814}] took [9836ms] which is above the warn thresho>
[2021-03-31T07:51:15,633][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7832ms] which is above the warn threshold of [5s]
[2021-03-31T08:00:57,701][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7313ms] which is above the warn threshold of [5s]
[2021-03-31T08:05:13,225][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5179ms] which is above the warn threshold of [5s]
[2021-03-31T08:07:50,096][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6490ms] which is above the warn threshold of [5s]
[2021-03-31T08:19:56,215][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17648][23] duration [1.1s], collections [1]/[1.4s], total [1.1s]/[21.9s], memory [131mb]->[91mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.9mb]/>
[2021-03-31T08:19:56,957][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17648] overhead, spent [1.1s] collecting in the last [1.4s]
I am not sure what could be the reason. Please suggest to me how do I solve this issue?