Status Red : "java.lang.OutOfMemoryError: Java heap space" causing unassigned shards

Hi everyone,

I've set an ES 4 data nodes cluster (6 threads, 32 Go of RAM, 500 Go of data each) with logstash on one of them.

My goal is to put 600 Go of log data into ES using logstash, but let's start with 3 Go.

After hours of processing logstash can't find elasticsearch anymore, and a got these logs :

[2018-04-08T02:07:49,915][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11926] overhead, spent [6.3s] collecting in the last [6.5s]
[2018-04-08T02:07:53,207][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11927] overhead, spent [3.1s] collecting in the last [3.2s]
[2018-04-08T02:07:59,108][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11928] overhead, spent [5.8s] collecting in the last [5.9s]
[2018-04-08T02:08:02,611][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11929] overhead, spent [3.4s] collecting in the last [3.5s]
[2018-04-08T02:08:05,123][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11930] overhead, spent [2.4s] collecting in the last [2.5s]
[2018-04-08T02:08:19,796][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11931] overhead, spent [14.5s] collecting in the last [14.6s]
[2018-04-08T02:08:35,026][WARN ][o.e.m.j.JvmGcMonitorService] [ws55-dn1] [gc][11932] overhead, spent [15s] collecting in the last [15.2s]
[2018-04-08T02:16:03,762][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
[...]
[2018-04-08T02:17:29,694][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
[...]
[2018-04-08T02:17:29,716][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
[...]
[2018-04-08T02:17:29,818][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ws55-dn1] fatal error in thread [elasticsearch[ws55-dn1][management][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
[2018-04-08T02:16:16,836][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ws55-dn1] fatal error in thread [elasticsearch[ws55-dn1][bulk][T#3]], exiting
java.lang.OutOfMemoryError: Java heap space 

I can't even check my cluster health :

curl -XGET 'localhost:9200/_cluster/health?pretty

"error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

But after restarting all nodes I finally get :

curl -XGET 'localhost:9200/_cluster/health?pretty'

{
  "cluster_name" : "airbus",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 11387,
  "delayed_unassigned_shards" : 8540,
  "number_of_pending_tasks" : 12,
  "number_of_in_flight_fetch" : 4880,
  "task_max_waiting_in_queue_millis" : 33609,
  "active_shards_percent_as_number" : 0.0
}

11387 unassigned shards ...

What do you think about this ?

  • Is there to many shards ?
    im indexing my data with %{hostname}-%{date} so I a have around 128 * 30 = 3840 indexes for my first 3 Go of log data

  • not enough cluster capacity ?

Thank you in advance for your help

Yes, that sounds like far to many indices and shards for a cluster that size. Read this blog post for some guidance of shards and sharding strategies.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.