JVM Heap Size

We have a pretty new Elasticsearch set up. We seem to be hitting a memory error that we can not overcome. I've increased the values to one half of the installed memory.

Any times on adjusting the JVM?


[2017-09-20T09:47:44,005][INFO ][o.e.m.j.JvmGcMonitorService] [elastic1.xxx.university.edu] [gc][old][353][80] duration [2.4m], collections [21]/[1.2m], total [2.4m]/[9.2m], memory [7.8gb]->[7.8gb]/[7.8gb], all_pools {[young] [998.5mb]->[998.5mb]/[998.5mb]}{[survivor] [120.1mb]->[124.7mb]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-09-20T09:47:44,005][WARN ][o.e.m.j.JvmGcMonitorService] [elastic1.xxx.university.edu] [gc][353] overhead, spent [2.4m] collecting in the last [1.2m]
[2017-09-20T09:45:07,033][DEBUG][o.e.a.b.TransportShardBulkAction] [elastic1.xxx.university.edu] [bro-2017.09.19.21][4] failed to execute bulk item (index) BulkShardRequest [[bro-2017.09.19.21][4]] containing [74] requests
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [version]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:468) ~[elasticsearch-5.5.0.jar:5.5.0]

at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.0.jar:5.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

Caused by: java.lang.NumberFormatException: For input string: "1.1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_91]
at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_91]
at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_91]
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:172) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:740) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:719) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.mapper.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:1058) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:287) ~[elasticsearch-5.5.0.jar:5.5.0]
... 36 more
[2017-09-20T09:47:44,022][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [elastic1.xxx.university.edu] fatal error in thread [elasticsearch[elastic1.xxx.university.edu][bulk][T#19]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.packed.PackedLongValues$Builder.(PackedLongValues.java:185) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.util.packed.DeltaPackedLongValues$Builder.(DeltaPackedLongValues.java:59) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:55) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:60) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.NormValuesWriter.(NormValuesWriter.java:41) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain$PerField.setInvertState(DefaultIndexingChain.java:692) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain$PerField.(DefaultIndexingChain.java:682) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:622) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:445) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1571) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1316) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:661) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:605) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:505) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:556) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:545) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:484) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:143) ~[elasticsearch-5.5.0.jar:5.5.0]
at

And Kibana is also showing some issues.

xxx[root@elastic1 kibana]# systemctl status kibana -l
? kibana.service - Kibana
Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2017-09-20 10:07:53 EDT; 48s ago
Main PID: 4791 (node)
CGroup: /system.slice/kibana.service
??4791 /usr/share/kibana/bin/../node/bin/node --no-warnings /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

Sep 20 10:08:36 elastic1.xxx.university.edu kibana[4791]: {"type":"response","@timestamp":"2017-09-20T14:08:36Z","tags":[],"pid":4791,"method":"get","statusCode":304,"req":{"url":"/plugins/kibana/assets/settings.svg","method":"get","headers":{"host":"elastic1.xxx.university.edu:5601","connection":"keep-alive","if-none-match":""4f859e27d4917026ff1590805887902b14ce79d5-gzip"","if-modified-since":"Fri, 30 Jun 2017 23:32:08 GMT","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","accept":"image/webp,image/apng,image/,/;q=0.8","referer":"http://elastic1.xxx.university.edu:5601/app/kibana","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.8"},"remoteAddress":"130.64.40.229","userAgent":"130.64.40.229","referer":"http://elastic1.xxx.university.edu:5601/app/kibana"},"res":{"statusCode":304,"responseTime":7,"contentLength":9},"message":"GET /plugins/kibana/assets/settings.svg 304 7ms - 9.0B"}
Sep 20 10:08:36 elastic1.xxx.university.edu kibana[4791]: {"type":"response","@timestamp":"2017-09-20T14:08:36Z","tags":[],"pid":4791,"method":"get","statusCode":304,"req":{"url":"/ui/fonts/open_sans/open_sans_v13_latin_regular.woff2","method":"get","headers":{"host":"elastic1.xxx.university.edu:5601","connection":"keep-alive","origin":"http://elastic1.xxx.university.edu:5601","if-none-match":""afc44700053c9a28f9ab26f6aec4862ac1d0795d"","if-modified-since":"Fri, 30 Jun 2017 23:32:08 GMT","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","accept":"
/","referer":"http://elastic1.xxx.university.edu:5601/app/kibana","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.8"},"remoteAddress":"130.64.40.229","userAgent":"130.64.40.229","referer":"http://elastic1.xxx.university.edu:5601/app/kibana"},"res":{"statusCode":304,"responseTime":9,"contentLength":9},"message":"GET /ui/fonts/open_sans/open_sans_v13_latin_regular.woff2 304 9ms - 9.0B"}
Sep 20 10:08:36 elastic1.xxx.university.edu kibana[4791]: {"type":"response","@timestamp":"2017-09-20T14:08:36Z","tags":[],"pid":4791,"method":"get","statusCode":304,"req":{"url":"/bundles/4b5a84aaf1c9485e060c503a0ff8cadb.woff2","method":"get","headers":{"host":"elastic1.xxx.university.edu:5601","connection":"keep-alive","origin":"http://elastic1.xxx.university.edu:5601","if-none-match":""574ea2698c03ae9477db2ea3baf460ee32f1a7ea"","if-modified-since":"Fri, 30 Jun 2017 23:32:08 GMT","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","accept":"
/","referer":"http://elastic1.xxx.university.edu:5601/bundles/commons.style.css?v=15382","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.8"},"remoteAddress":"130.64.40.229","userAgent":"130.64.40.229","referer":"http://elastic1.xxx.university.edu:5601/bundles/commons.style.css?v=15382"},"res":{"statusCode":304,"responseTime":8,"contentLength":9},"message":"GET /bundles/4b5a84aaf1c9485e060c503a0ff8cadb.woff2 304 8ms - 9.0B"}
Sep 20 10:08:36 elastic1.xxx.university.edu kibana[4791]: {"type":"response","@timestamp":"2017-09-20T14:08:36Z","tags":[],"pid":4791,"method":"get","statusCode":304,"req":{"url":"/bundles/0cebf3d61338c454670b1c5bdf5d6d8d.svg","method":"get","headers":{"host":"elastic1.xxx.university.edu:5601","connection":"keep-alive","if-none-match":""d52234e52fd4e96d20f52f4c03c0cedb8ab5fe17-gzip"","if-modified-since":"Fri, 30 Jun 2017 23:32:08 GMT","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","accept":"image/webp,image/apng,image/
,/;q=0.8","referer":"http://elastic1.xxx.university.edu:5601/bundles/commons.style.css?v=15382","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.8"},"remoteAddress":"130.64.40.229","userAgent":"130.64.40.229","referer":"http://elastic1.xxx.university.edu:5601/bundles/commons.style.css?v=15382"},"res":{"statusCode":304,"responseTime":9,"contentLength":9},"message":"GET /bundles/0cebf3d61338c454670b1c5bdf5d6d8d.svg 304 9ms - 9.0B"}
Sep 20 10:08:36 elastic1.xxx.university.edu kibana[4791]: {"type":"response","@timestamp":"2017-09-20T14:08:36Z","tags":[],"pid":4791,"method":"get","statusCode":304,"req":{"url":"/plugins/kibana/assets/play-circle.svg","method":"get","headers":{"host":"elastic1.xxx.university.edu:5601","connection":"keep-alive","if-none-match":""2433ecf38258f7121c835670b6993600e7657717-gzip"","if-modified-since":"Fri, 30 Jun 2017 23:32:08 GMT","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","accept":"image/webp,image/apng,image/,/*;q=0.8","referer":"http://elastic1.xxx.university.edu:5601/app/kibana","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.8"},"remoteAddress":"130.64.40.229","userAgent":"130.64.40.229","referer":"http://elastic1.xxx.university.edu:5601/app/kibana"},"res":{"statusCode":304,"responseTime":10,"contentLength":9},"message":"GET /plugins/kibana/assets/play-circle.svg 304 10ms - 9.0B"}
Sep 20 10:08:38 elastic1.xxx.university.edu kibana[4791]: {"type":"log","@timestamp":"2017-09-20T14:08:38Z","tags":["warning","elasticsearch","admin"],"pid":4791,"message":"Unable to revive connection: http://10.246.158.2:9200/"}

So everything just sort of runs in this error state for awhile and then the Elasticsearch service fails out on two of the three nodes.

Node1 and 3 fail while node 2 goes into a tail spin looking for more nodes:
[root@elastic1 kibana]# systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-09-20 09:47:53 EDT; 28min ago
Docs: http://www.elastic.co
Process: 4244 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=127)
Process: 4241 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 4244 (code=exited, status=127)

Sep 20 09:32:42 elastic1.xxx.university.edu systemd[1]: Starting Elasticsearch...
Sep 20 09:32:42 elastic1.xxx.university.edu systemd[1]: Started Elasticsearch.
Sep 20 09:47:53 elastic1.xxx.university.edu systemd[1]: elasticsearch.service: main process exited, code=exited, status=127/n/a
Sep 20 09:47:53 elastic1.xxx.university.edu systemd[1]: Unit elasticsearch.service entered failed state.
Sep 20 09:47:53 elastic1.xxx.university.edu systemd[1]: elasticsearch.service failed.
[root@elastic1 kibana]# tail -f /data/logs/elasticsearch/ttsinfosec-es01.log
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1657) ~[elasticsearch-5.5.0.jar:5.5.0]

So now I've restarted the service on all three nodes to sit and see what happens:

systemctl status elasticsearch

● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2017-09-20 10:22:51 EDT; 6s ago
Docs: http://www.elastic.co
Process: 4826 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 4830 (java)
CGroup: /system.slice/elasticsearch.service
└─4830 /bin/java -Xms8g -Xmx8g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Dj...

Sep 20 10:22:51 elastic1.xxx.university.edu systemd[1]: Starting Elasticsearch...
Sep 20 10:22:51 elastic1.xxx.university.edu systemd[1]: Started Elasticsearch.

How large is your heap? How many indices and shards?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.