Fatal error in network and heap

Hello
I have following config :
2 client node
3 master
3 data+ingest node
my cluster work only with 1 data node and 2 of them get following error :

 [2018-02-03T18:22:54,183][WARN ][o.e.m.j.JvmGcMonitorService] [es-data-02] [gc][1984] overhead, spent [1.4m] collecting in the last [1.4m]
    [2018-02-03T18:33:50,485][INFO ][o.e.m.j.JvmGcMonitorService] [es-data-02] [gc][old][1985][422] duration [10.9m], collections [76]/[10.9m], total [10.9m]/[52.4m], memory [3.8gb]->[3.8gb]/[3.8gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [108mb]->[107.8mb]/[108.1mb]}{[old] [2.9gb]->[2.9gb]/[2.9gb]}
    [2018-02-03T18:33:50,486][WARN ][o.e.m.j.JvmGcMonitorService] [es-data-02] [gc][1985] overhead, spent [10.9m] collecting in the last [10.9m]
    [2018-02-03T18:34:38,367][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
            at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:185)
            at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
            at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
            at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
            at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256)
            at 
            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:104)
            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:145)
            at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
            at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
            at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
            at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
            at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
            at java.lang.Thread.run(Thread.java:745)
    [2018-02-03T18:42:31,063][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-data-02] fatal error in thread [elasticsearch[es-data-02][search][T#14]], exiting
    java.lang.OutOfMemoryError: Java heap space
    [2018-02-03T18:42:31,061][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-data-02] fatal error in thread [elasticsearch[es-data-02][generic][T#6]], exiting
    java.lang.OutOfMemoryError: Java heap space

    [2018-02-03T18:33:59,924][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-data-02] fatal error in thread [elasticsearch[es-data-02][management][T#3]], exiting
    java.lang.OutOfMemoryError: Java heap space
            at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) ~[?:1.8.0_111]
            at java.lang.StringCoding.encode(StringCoding.java:344) ~[?:1.8.0_111]
            at java.lang.String.getBytes(String.java:918) ~[?:1.8.0_111]
            at java.io.UnixFileSystem.canonicalize0(Native Method) ~[?:1.8.0_111]
            at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172) ~[?:1.8.0_111]
            at java.io.File.getCanonicalPath(File.java:618) ~[?:1.8.0_111]
            at java.io.FilePermission$1.run(FilePermission.java:215) ~[?:1.8.0_111]
            at java.io.FilePermission$1.run(FilePermission.java:203) ~[?:1.8.0_111]
            at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_111]
            at java.io.FilePermission.init(FilePermission.java:203) ~[?:1.8.0_111]
            at java.io.FilePermission.<init>(FilePermission.java:277) ~[?:1.8.0_111]
            at java.lang.SecurityManager.checkRead(SecurityManager.java:888) ~[?:1.8.0_111]
            at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795) ~[?:?]
            at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:49) ~[?:?]
            at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) ~[?:?]
            at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) ~[?:?]
            at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_111]
            at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_111]
            at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
            at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
            at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
            at org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1402) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1391) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1378) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.index.store.Store.stats(Store.java:332) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:703) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:177) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:163) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:433) ~[elasticsearch-5.6.3.jar:5.6.3]
            at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412) ~[elasticsearch-5.6.3.jar:5.6.3]

would you please help me to solve this issue
best

It looks like you are suffering from insufficient heap space. What is the full output of the cluster stats API?

Hello
{"_nodes":{"total":7,"successful":7,"failed":0},"cluster_name":"logserver","timestamp":1517731166673,"status":"yellow","indices":{"count":506,"shards":{"total":2090,"primaries":2090,"replication":0.0,"index":{"shards":{"min":1,"max":5,"avg":4.130434782608695},"primaries":{"min":1,"max":5,"avg":4.130434782608695},"replication":{"min":0.0,"max":0.0,"avg":0.0}}},"docs":{"count":510863423,"deleted":468685},"store":{"size":"310.3gb","size_in_bytes":333253899037,"throttle_time":"0s","throttle_time_in_millis":0},"fielddata":{"memory_size":"4.2kb","memory_size_in_bytes":4392,"evictions":0},"query_cache":{"memory_size":"8.5mb","memory_size_in_bytes":8934789,"total_count":458244,"hit_count":306880,"miss_count":151364,"cache_size":1078,"cache_count":2472,"evictions":1394},"completion":{"size":"0b","size_in_bytes":0},"segments":{"count":21355,"memory":"1.6gb","memory_in_bytes":1753280287,"terms_memory":"1.2gb","terms_memory_in_bytes":1356601416,"stored_fields_memory":"103.5mb","stored_fields_memory_in_bytes":108602544,"term_vectors_memory":"0b","term_vectors_memory_in_bytes":0,"norms_memory":"460.9kb","norms_memory_in_bytes":472000,"points_memory":"12.6mb","points_memory_in_bytes":13311835,"doc_values_memory":"261.5mb","doc_values_memory_in_bytes":274292492,"index_writer_memory":"43.1mb","index_writer_memory_in_bytes":45243776,"version_map_memory":"9.9mb","version_map_memory_in_bytes":10395017,"fixed_bit_set":"86.5kb","fixed_bit_set_memory_in_bytes":88584,"max_unsafe_auto_id_timestamp":1517413012754,"file_sizes":{}}},"nodes":{"count":{"total":7,"data":1,"coordinating_only":3,"master":3,"ingest":1},"versions":["5.6.3"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":7}],"mem":{"total":"57.3gb","total_in_bytes":61590376448,"free":"1.2gb","free_in_bytes":1341280256,"used":"56.1gb","used_in_bytes":60249096192,"free_percent":2,"used_percent":98}},"process":{"cpu":{"percent":12},"open_file_descriptors":{"min":334,"max":5136,"avg":1021}},"jvm":{"max_uptime":"58.8d","max_uptime_in_millis":5081490657,"versions":[{"version":"1.8.0_111","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.111-b15","vm_vendor":"Oracle Corporation","count":7}],"mem":{"heap_used":"11.8gb","heap_used_in_bytes":12722490824,"heap_max":"31.6gb","heap_max_in_bytes":34037170176},"threads":697},"fs":{"total":"2.1tb","total_in_bytes":2403531415552,"free":"1.8tb","free_in_bytes":2009820860416,"available":"1.7tb","available_in_bytes":1902446678016,"spins":"true"},"plugins":[{"name":"x-pack","version":"5.6.3","description":"Elasticsearch Expanded Pack Plugin","classname":"org.elasticsearch.xpack.XPackPlugin","has_native_controller":true}],"network_types":{"transport_types":{"security4":7},"http_types":{"security4":7}}}}

Given the number of data nodes you have and the amount of heap assigned to these, you seem to have too many shards. Read this blog post for guidance on how many shards you should have in your cluster.

172.24.69.21 7 98 0 0.00 0.02 0.05 - - es-client-03
172.24.69.14 12 98 0 0.00 0.02 0.05 - - es-client-02
172.24.69.16 81 97 33 5.48 5.30 5.24 di - es-data-01
172.24.69.13 12 98 0 0.00 0.04 0.05 - - es-client-01
172.24.69.20 58 98 2 0.01 0.05 0.05 m * es-master-02
172.24.69.19 16 98 0 0.00 0.01 0.05 m - es-master-01
172.24.69.22 14 98 0 0.00 0.01 0.05 m - es-master-03
I have 3 data node but 2 of them failed causes of fatal error
this is my JVM config

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms8g
-Xmx8g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# disable calls to System#gc
-XX:+DisableExplicitGC

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true

I believe you need to either add more heap or reduce the mount of heap used, e.g. by reducing the shard count. As your average shard size looks quite small, I would however first look at reducing the number of shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.