Elasticsearch 2.4.3 very high cpu usage

ES version :2.4.3
java version:1.8.0_121
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

My production ES nodes just only one,it is master node and also the data node ,it have 4 CPUs, and 8 GB RAM,ES heap size is 3900M,1 indices

elasticsearch.yml:

node.name: node_master
node.master: true
node.data: true
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization
action.auto_create_index: true
script.inline: on
script.indexed: on

Start the ES command:

nohup ./elasticsearch -Des.insecure.allow.root=true

We use logstash to synchronize MySQL data to ES and use the ES client api to batch delete expired data every day at 1am,online data is maintained at around 500 Mbp,it has been running normally for more than a year.

logstash jdbc.conf:

jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
jdbc_fetch_size => "50000"
use_column_value => "true"
record_last_run => "true"
tracking_column => "updatetime"
last_run_metadata_path => "/usr/elasticsearch/logstash-2.4.1/run_metadata"
statement_filepath => "jdbc.sql"

It suddenly had a problem last night, the cpu was very high, I used the top command to query, the result shows 376%.

jstat -gc pid

I didn't see a frequent GC happening

I tried to rebuild indices and the problem still exists.

The strange thing is that I can use the ES of the test environment to run normally.It have 2 CPUS and 4G RAM,ES heap size is 1900M

Hot threads
1、hot_log_1
2、hot_log_2
3、hot_log_3

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
# Optionally 
GET /_cat/shards?v

If some outputs are too big, please share them on gist.github.com and link them here.

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.

Thanks for your suggestion.

GET /
{
  "name" : "node_master",
  "cluster_name" : "es",
  "cluster_uuid" : "fT5AHLDaTm67f6mJHa6tdg",
  "version" : {
    "number" : "2.4.3",
    "build_hash" : "d38a34e7b75af4e17ead16f156feffa432b22be3",
    "build_timestamp" : "2016-12-07T16:28:56Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}
GET /_cat/nodes?v
host          ip            heap.percent ram.percent load node.role master name           
***         ***            3          34 0.01 d         *      node_master 
GET /_cat/health?v
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1559776603 23:16:43  es   yellow          1         1      5   5    0    0        5             0                  -                 50.0% 
GET /_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   item   5   1     105216       144169      395mb          395mb 
GET /_cat/shards?v
index shard prirep state       docs  store ip            node           
item 1     p      STARTED    21030 80.5mb *** node_master 
item 1     r      UNASSIGNED                                           
item 3     p      STARTED    21021 79.8mb *** node_master 
item 3     r      UNASSIGNED                                           
item 2     p      STARTED    20914 71.9mb *** node_master 
item 2     r      UNASSIGNED                                           
item 4     p      STARTED    21073 80.9mb *** node_master 
item 4     r      UNASSIGNED                                           
item 0     p      STARTED    21178 81.8mb *** node_master 
item 0     r      UNASSIGNED         

Hi,

I had similar problem with a 1.x version the problem was because of too much deleted document I had 50% more deleted document. But in your case you have ~150% deleted documents. :astonished:

health status index pri rep docs.count docs.deleted
yellow open   item   5   1     105216       144169

I moved my data to a daily index and the problem was solved.

We use logstash to synchronize MySQL data to ES and use the ES client api to batch delete expired data every day at 1am

You'll better to go with a daily index and delete the full old index instead of deleting documents.

I can't find back the exact post about how the deletion work but I think I read it from this blog.
http://blog.mikemccandless.com/

1 Like

Thank you for your reply, I tried to delete the index with es_head and then rebuild, the problem is still,but the views you provided are very novel

Good catch!

@fgd123 you can try to call the ForceMerge API to remove deleted documents may be.

I'm surprised that 4gb of heap with few documents you are having issues.
But this is may be caused by a bug or the GC running for a long time.
That said you are using 2.4.3 which is really old. What about upgrading to 2.4.6 at least? Better to go to 7.1.1 BTW if you can.

Yes, upgrading to the latest version is the best result, but the low version to the high version must consider all aspects of compatibility, it may take more time, currently I am using the ES in the test environment, it is like a Bomb, I don’t know when it will crash again. I hope that the online ES can recover first. I have restarted and rebuilt the index, but it still doesn't work.

Also just though about one thing, I had a high CPU problem on my vm because of my HDD was nearly full. Just in case as deleting the index must fix.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.