ElasticSearch stops working after sometime

Devendra_Deshmukh · April 6, 2021, 7:03pm

My ElasticSearch server works fine for a few hours or a day and suddenly stops working. It has 1 shard and 1 replica on a single node installed on VPS along with the application server and has only 1 index with 30,000 documents.

My Configuration:

VPS: 1 CPU Core, 2GB RAM
Ubuntu 20.10
ElasticSearch Version: 7.8.0
Heap Size: -Xms1g -Xmx1g

When I checked the logs it seems to me that the ElasticSearch server stops after the health check.

    [2021-03-29T01:30:00,007][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] triggering scheduled [ML] maintenance tasks
    [2021-03-29T01:30:00,032][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] starting SLM retention snapshot cleanup task
    [2021-03-29T01:30:00,084][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] there are no repositories to fetch, SLM retention snapshot cleanup task complete
    [2021-03-29T01:30:00,232][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Deleting expired data
    [2021-03-29T01:30:00,611][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [node-1] Successfully deleted [0] unused stats documents
    [2021-03-29T01:30:00,621][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Completed deletion of expired ML data
    [2021-03-29T01:30:00,622][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask
    [2021-03-29T02:38:45,814][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][60425] overhead, spent [423ms] collecting in the last [1s]
    [2021-03-29T14:02:17,728][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [12258ms] which is above the warn threshold of [5s]
    [2021-03-29T14:07:46,549][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5140ms] which is above the warn threshold of [5s]
    [2021-03-29T14:09:17,396][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][101248] overhead, spent [553ms] collecting in the last [1.7s]

Another log

    [2021-03-31T02:01:56,154][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[reports][0]]]).
    [2021-03-31T06:08:19,126][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6940ms] which is above the warn threshold of [5s]
    [2021-03-31T06:54:45,818][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16670][19] duration [2.7s], collections [1]/[26.8s], total [2.7s]/[4.4s], memory [694.9mb]->[90.9mb]/[1gb], all_pools {[young] [604mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
    [2021-03-31T07:03:44,953][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7148ms] which is above the warn threshold of [5s]
    [2021-03-31T07:11:03,918][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16975][20] duration [11.3s], collections [1]/[12s], total [11.3s]/[15.7s], memory [130.9mb]->[90.8mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
    [2021-03-31T07:11:06,610][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][16975] overhead, spent [11.3s] collecting in the last [12s]
    [2021-03-31T07:28:04,708][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5557ms] which is above the warn threshold of [5s]
    [2021-03-31T07:30:30,545][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5035ms] which is above the warn threshold of [5s]
    [2021-03-31T07:35:07,502][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5732ms] which is above the warn threshold of [5s]
    [2021-03-31T07:35:12,985][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17163][21] duration [4.9s], collections [1]/[3.4s], total [4.9s]/[20.6s], memory [126.8mb]->[130.8mb]/[1gb], all_pools {[young] [36mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
    [2021-03-31T07:35:16,582][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17163] overhead, spent [4.9s] collecting in the last [3.4s]
    [2021-03-31T07:44:37,323][WARN ][o.e.h.AbstractHttpServerTransport] [node-1] handling request [null][POST][/reports/_count][Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:37814}] took [9836ms] which is above the warn thresho>
    [2021-03-31T07:51:15,633][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7832ms] which is above the warn threshold of [5s]
    [2021-03-31T08:00:57,701][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7313ms] which is above the warn threshold of [5s]
    [2021-03-31T08:05:13,225][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5179ms] which is above the warn threshold of [5s]
    [2021-03-31T08:07:50,096][WARN ][o.e.m.f.FsHealthService  ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6490ms] which is above the warn threshold of [5s]
    [2021-03-31T08:19:56,215][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17648][23] duration [1.1s], collections [1]/[1.4s], total [1.1s]/[21.9s], memory [131mb]->[91mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.9mb]/>
    [2021-03-31T08:19:56,957][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17648] overhead, spent [1.1s] collecting in the last [1.4s]

I am not sure what could be the reason. Please suggest to me how do I solve this issue?

warkolm · April 6, 2021, 9:49pm

There's nothing in those logs that shows Elasticsearch shutting down. If there is part of the log that shows that, please post it.

Devendra_Deshmukh · April 8, 2021, 6:07am

Maybe this might help.

    elasticsearch.service - Elasticsearch
     Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Tue 2021-04-06 19:15:19 UTC; 1 day 10h ago
       Docs: https://www.elastic.co
       Main PID: 727 (code=killed, signal=KILL)
    Apr 06 19:00:57 products systemd[1]: Starting Elasticsearch...
    Apr 06 19:02:56 products systemd[1]: Started Elasticsearch.
    Apr 06 19:15:19 products systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
    Apr 06 19:15:19 products systemd[1]: elasticsearch.service: Failed with result 'signal'.
    Apr 06 19:15:19 products systemd[1]: elasticsearch.service: Unit process 1519 (controller) remains running after unit stopped.

stephenb · April 8, 2021, 6:21am

Looks to me like something / someone some process / some scanner is sending a

kill -9

i.e sending a kill command to the elasticsearch process.

At another user a security scanner was killing unrecognized processes...could it be something like that?

Devendra_Deshmukh · April 9, 2021, 5:13am

Is it may be because of insufficient memory? Maybe Linux or JVM itself killing the Elasticsearch process.

My Configuration are:

VPS: 1 CPU Core, 2GB RAM
Ubuntu 20.10
Elasticsearch Version: 7.8.0
Heap Size: -Xms1g -Xmx1g

Everything is running on the same VPS machine, web server, MySQL, elastic, etc.

Elasticsearch worked fine on my development server.

stephenb · April 9, 2021, 5:25am

Yes that could be it that is a very small server to be running all of that on..... Less than my phone

The reason I asked is the reason in the log says kill/9 which is different than a process just dying on its own.

Devendra_Deshmukh:

Apr 06 19:15:19 products systemd[1]: elasticsearch.service: 
Main process exited, 
code=killed, status=9/KILL <<------- HERE
    Apr 06 19:15:19 products systemd[1]: elasticsearch.service: Failed with result 'signal'.

So not sure.... There could some other process taking more resources so elasticsearch dies, I am not sure of the error code when elasticsearch runs out of memory.

dadoonet · April 9, 2021, 5:28am

It could be an OOM killer I guess?

Devendra_Deshmukh · April 10, 2021, 1:52pm

Finally, I got this issue resolved by increasing server memory from 2GB to 4GB. Actually, due to insufficient memory on the VPS server, the kernel itself was killing JVM process, ultimately, closing the ElasticSearch server.

I also decreased heap size from 1GB to 750MB.

Apr 10 10:55:43 products kernel: Out of memory: Killed process 728 (java)

Thank you Stephen and David for the help. I really appreciate your help.

dadoonet · April 10, 2021, 3:23pm

You should definitely avoid that if this is meant for production. For tests purposes, I guess it's fine.

But glad you solved the problem.

stephenb · April 10, 2021, 3:30pm

To echo @dadoonet in production only elasticsearch should run on a single vm... No other applications...

system · May 8, 2021, 3:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch stop after few days Elasticsearch	3	1773	July 5, 2017
Elasticsearch stop working suddenly Elasticsearch	8	6123	June 17, 2019
JVM pause,Elasticsearch stop Elasticsearch	3	2526	July 5, 2017
Elastic search stops after a few minutes Elasticsearch	3	2493	November 28, 2018
Instant crash on startup Elasticsearch	15	5114	July 6, 2017

ElasticSearch stops working after sometime

Related topics