WARN message (Elasticsearch)

Hi

Since three days ago I started to received some meesages in the elasticsearch.log file and sometimes my elastic platform freeze and don't load the data, the error Kibana is:

"{"statusCode":500,"error":"Internal Server Error","message":"An internal server error occurred"}".

The messages in the elasticsearch.log file are:

[2019-03-26T16:40:43,447][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527283] overhead, spent [763ms] collecting in the last [1s]
[2019-03-26T16:40:44,520][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527284] overhead, spent [669ms] collecting in the last [1s]
[2019-03-26T16:40:48,970][INFO ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527288] overhead, spent [673ms] collecting in the last [1.4s]
[2019-03-26T16:40:49,970][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527289] overhead, spent [763ms] collecting in the last [1s]
[2019-03-26T16:40:55,450][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527294] overhead, spent [761ms] collecting in the last [1.4s]
[2019-03-26T16:40:56,450][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527295] overhead, spent [821ms] collecting in the last [1s]
[2019-03-26T16:40:59,689][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527298] overhead, spent [745ms] collecting in the last [1.2s]
[2019-03-26T16:41:02,065][INFO ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527300] overhead, spent [686ms] collecting in the last [1.3s]
[2019-03-26T16:41:06,187][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527304] overhead, spent [688ms] collecting in the last [1.1s]
[2019-03-26T16:41:08,428][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527305] overhead, spent [1.3s] collecting in the last [2.2s]
[2019-03-26T16:41:10,022][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527306] overhead, spent [1.4s] collecting in the last [1.5s]
[2019-03-26T16:41:15,801][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527311] overhead, spent [1.4s] collecting in the last [1.7s]
[2019-03-26T16:41:16,802][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527312] overhead, spent [679ms] collecting in the last [1s]
[2019-03-26T16:41:18,802][WARN ][o.e.m.j.JvmGcMonitorService] [fycetJG] [gc][527314] overhead, spent [691ms] collecting in the last [1s]

In google I found that this could be some sharp errors but the true is I don't know how to check this sharps and how to modify them.

Please your help letting my know if someone had the same error in the past, thanks.

Regards

Hi Luis,

I think you mean "shards", not "sharps", but I'm quite new here, so please excuse me if I'm wrong.

Assuming indeed you are talking about shards, is my understanding that each shard is a lucene index. As such, each shard comes with a price in terms of resources. So I could understand how the amount of shards could affect the available resources in our system.

The gc monitor service, seems to be a montior for the garbage collection and by the timings of your error, it seems that you are having some continous garbage collection going on. This is not good as during gc the jvm will not be responsive.

From my little understanding, I would guess that this is basically lack of ram and/or a very low jvm heap size for the needs you have in your environment.

Could you share with us a little bit about your elasticsearch dimensions? In terms of ram, heap size assigned to the jvm ("ps -ef | grep java" and see what values you have under xmx and xms), amount and size of your indexes as well as the amount of shards?

4 Likes

That is exactly right @pup_seba :slight_smile:
Great understanding of the inner workings already for being new to this as you say!

The issue with the heap is that it needs to be adequately sized as we describe in detail in https://www.elastic.co/blog/a-heap-of-trouble. There's a picture further down showing a nice sawtooth pattern that you are looking for.

The amount of shards a node should have is directly correlated with the heap size (https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster) and the old GC seen here as mentioned basically stops the node from doing anything else while it is running the cleanup.

This looks like a typical case of oversharding and with the information requested by Sebastián it will be easy to tell if that is so.

4 Likes

Hi

First of all is shards, sorry.

And modifying the heap size from java.options and restarting the elasticsearch fix the issue, is working fine now.

Thanks for your response.

Regards.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.