High CPU usage in Monitoring Server due to ES

I am running CentOS, and I have ELK(ElasticSearch, Logstash, Kibana) and Graphite, Graphana on this VM.
When I run top I can see ES is the culprit
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
899 elasticsearch 20 0 6110m 1.2g 35m S 100.0 31.3 1262:28 java

I don't know why it started acting like this. I was told hot threads will help but I am new to ES and need help to understand it.

My hot_threads: https://gist.github.com/anonymous/7f75f702c0a5edf788bfa62ba83ffd21

Thanks

There's not really that much there to explain this.
What about your ES logs, can you see much GC there?

I can see it only once which is more than a week ago

[2016-03-12 10:29:59,364][INFO ][monitor.jvm ] [Amelia Voght] [gc][old][209984][63663] duration [5.1s], collections [1]/[5.4s], total [5.1s]/[13.6h], memory [952.1mb]->[960.5mb]/[1007.3mb], all_pools {[young] [94.5mb]->[102.9mb]/[133.1mb]}{[survivor] [0b]->[0b]/[16.6mb]}{[old] [857.6mb]->[857.6mb]/[857.6mb]}

I don't know if it is related but cluster health is yellow

curl http://localhost:9200/_cluster/health?pretty
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1200,
"active_shards" : 1200,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1200
}

No it's not, but that's a very high shard count for a single node.
I'd look to reduce that.

Will it help if I reduce to shards, by archiving them since I can restore it to a local ES when it is needed to see historical data?
What is a healthy number for a single node?
We may increase the node number too I guess, if that is causing the CPU usage.

If you are using Logstash then update the template to use 1 or 3 shards, or switch to weekly indices rather than daily.

Hi warkolm,

Yes we are using logstash and keeping history, hence the large numbers I think.
Where can I make the said changes? elasticsearch.yml or is there a logstash config file that I need to change?
The person who knows elasticsearch is not with us anymore, so I am using your help and Google

Thanks

You need to look at the _template API, then find the logstash template and change the shard count :slight_smile:

Thanks for the reply, I will test this soon

This is the template according to API.
Should I increase the 5seconds interval? :slight_smile:

-> curl -XGET localhost:9200/_template/?pretty
{
"logstash" : {
"order" : 0,
"template" : "logstash-",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"default" : {
"dynamic_templates" : [ {
"string_fields" : {
"mapping" : {
"index" : "analyzed",
"omit_norms" : true,
"type" : "string",
"fields" : {
"raw" : {
"index" : "not_analyzed",
"ignore_above" : 256,
"type" : "string"
}
}
},
"match_mapping_type" : "string",
"match" : "
"
}
} ],
"properties" : {
"geoip" : {
"dynamic" : true,
"path" : "full",
"properties" : {
"location" : {
"type" : "geo_point"
}
},
"type" : "object"
},
"@version" : {
"index" : "not_analyzed",
"type" : "string"
}
},
"_all" : {
"enabled" : true
}
}
},
"aliases" : { }
}
}

Guess I found the reason. I introduced a collector for elasticsearch metrics and found out that JVM uses almost all the assigned memory.

I grepped the elasticsearch process and it is like "/usr/bin/java -Xms256m -Xmx1g -Xss256k" which limits its memory usage to 1gb. I still don't know what 256m and 256k does though :slight_smile:

Correct me if I am wrong but increasing the limit should fix my problem right?

Thanks

To answer my own question, increasing the heap size fixed my CPU problem. Thanks for the help Mark.