Hi Team,
We are facing heap issues in elastic search 1.7.3 version for all the data nodes. Please find the below steps what we have implemented in my cluster.
Step-1: We have total 16 data nodes and each node is having 3 instances (data1, data2 and data3) total we have 48 instances and 3 masters+ 16 separate ingest(search) nodes. All the data nodes are bare metals and each node is having 7.1TB disk.
Filesystem Size Used Avail Use% Mounted on
/dev/sdi2 132G 16G 110G 13% /
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 0 252G 0% /dev/shm
tmpfs 252G 26M 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/mapper/Source--ES--eph-volume--367978823--14 7.0T 642G 6.4T 9% /app
Step-2: Please find the ES process configuration
elastic+ 13580 1 48 Dec05 ? 11:42:26 /bin/java -Xms30g -Xmx30g -Djava.awt.headless=true -XX:+UseG1GC -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:MaxGCPauseMillis=200 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=20 -XX:+UseStringDeduplication -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=3335 -Des.max-open-files=true -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/share/elasticsearch/logs/heapdump.hprof -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Delasticsearch -Des.foreground=yes -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.7.3.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/ -Des.pidfile=/var/run/elasticsearch/10.37.38.124-data1/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/usr/local/var/log/elasticsearch/10.37.38.124-data1 -Des.default.path.data=/app/data/elasticsearch/10.37.38.124-data1 -Des.default.path.conf=/etc/elasticsearch/data1 org.elasticsearch.bootstrap.Elasticsearch
ES_HEAP_SIZE=30g
MAX_LOCKED_MEMORY=unlimited
Additional Java OPTS
es_java_opts: "$ES_JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port= -Des.max-open-files=true",
ES_GC_OPTS="-XX:+UseG1GC -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:MaxGCPauseMillis=200 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=20 -XX:+UseStringDeduplication -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=3335 -Des.max-open-files=true"
export ES_GC_OPTS
Step-3: Please find the settings.
action.auto_create_index: true
action.destructive_requires_name: true
action.disable_delete_all_indices: true
bootstrap.mlockall: true
cluster.name: Cluster_name
cluster.routing.allocation.same_shard.host: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: master_node1:9301,master_node2:9301,master_node3:9301
http.port: 9202
index.mapper.dynamic: true
index.merge.policy.use_compound_file: false
index.number_of_replicas: 0
index.number_of_shards: 96
index.query.bool.max_clause_count: 10000
index.refresh_interval: 1000s
indices.fielddata.cache.size: 10%
indices.recovery.max_bytes_per_sec: 60mb
network.host: 0.0.0.0
node.data: true
node.master: false
script.inline: false
script.stored: false
script.file: false
script.groovy.sandbox.enabled: false
threadpool.bulk.queue_size: 300
threadpool.index.queue_size: 300
transport.tcp.port: 9302
threadpool.bulk.size: 60
threadpool.bulk.type: fixed
threadpool.index.size: 60
threadpool.index.type: fixed
threadpool.search.queue_size: 400
threadpool.search.size: 60
threadpool.search.type: fixed
discovery.zen.fd.ping_timeout: 180s
discovery.zen.fd.ping_interval: 60s
discovery.zen.fd.ping_retries: 3
indices.cluster.send_refresh_mapping: false
index.merge.policy.max_merge_at_once: 10
index.merge.policy.reclaim_deletes_weight: 2.0
index.merge.policy.max_merged_segment: 5GB
index.merge.policy.expunge_deletes_allowed: 10
index.merge.policy.segments_per_tier: 10
Step-4: Limits configuration setting under /etc/security/limits.conf
End of file
End of file
End of file
-
soft nproc 65535
-
hard nproc 65535
-
soft nofile 65535
-
hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
elasticsearch soft nproc 65535
elasticsearch hard nproc 65535
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
app soft nofile 16384
app hard nofile 16384
We have checked sestatus its already disabled in all the nodes (Centos7 we are using).
sestatus
SELinux status: disabled
Free-memory:
free -g
total used free shared buff/cache available
Mem: 503 91 410 0 1 410
Swap: 0 0 0
and also we are dropping the caches for every 5 mints.
#Drop the page cache
*/5 * * * * sync; echo 1 > /proc/sys/vm/drop_caches
We implemented above all the steps but still data nodes are using 24 to 25GB out of 30GB (90%) every time and its not releasing the GC and cluster become red and nodes are going down.
Please suggest me anything we missed setting and configurations to fix this heap issue.