Hello,
I ran into issues with elasticsearch and I’m looking for advice or a solution.
My Ruby on Rails application implements elasticsearch through elasticsearch-rails gem.
Production server info:
Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz # 8 cores
16G RAM
200Gb Free disk space
# curl localhost:9200
{
"name" : "qWfkDHn",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "bO29LJBOQKmOIM8CBsYDBQ",
"version" : {
"number" : "5.5.0",
"build_hash" : "260387d",
"build_date" : "2017-06-30T23:16:05.735Z",
"build_snapshot" : false,
"lucene_version" : "6.6.0"
},
"tagline" : "You Know, for Search"
}
# curl -XGET localhost:9200/_cat/indices
green open people Dq5QQ4rvR2yo6vKWPPCI3Q 1 0 436988 101 10.7gb 9.7gb
yellow open file_attachments p2bh604YSDy3yK1CioT9uA 5 1 55 4 923kb 923kb
green open companies QZ-O449uT4ypPVfcsq4Neg 1 0 658840 20640 720mb 720mb
yellow open external_links 881Q6MQVScOwMJndB4pI3g 5 1 6 0 50.2kb 50.2kb
yellow open personal_emails 5E5u9_KkSpSjy4dD-Fi7Xg 5 1 13 0 96.5kb 96.5kb
# curl -XGET localhost:9200/_cat/shards
personal_emails 5 shards
personal_emails 4 p STARTED 4 29.4kb 127.0.0.1 qWfkDHn
personal_emails 4 r UNASSIGNED
personal_emails 2 p STARTED 2 14.8kb 127.0.0.1 qWfkDHn
personal_emails 2 r UNASSIGNED
personal_emails 3 p STARTED 1 7.8kb 127.0.0.1 qWfkDHn
personal_emails 3 r UNASSIGNED
personal_emails 1 p STARTED 2 14.9kb 127.0.0.1 qWfkDHn
personal_emails 1 r UNASSIGNED
personal_emails 0 p STARTED 4 29.5kb 127.0.0.1 qWfkDHn
personal_emails 0 r UNASSIGNED
external_links 4 p STARTED 0 194b 127.0.0.1 qWfkDHn
external_links 4 r UNASSIGNED
external_links 2 p STARTED 1 8.3kb 127.0.0.1 qWfkDHn
external_links 2 r UNASSIGNED
external_links 3 p STARTED 2 16.6kb 127.0.0.1 qWfkDHn
external_links 3 r UNASSIGNED
external_links 1 p STARTED 2 16.5kb 127.0.0.1 qWfkDHn
external_links 1 r UNASSIGNED
external_links 0 p STARTED 1 8.4kb 127.0.0.1 qWfkDHn
external_links 0 r UNASSIGNED
companies 0 p STARTED 658840 720mb 127.0.0.1 qWfkDHn
file_attachments 4 p STARTED 8 204.1kb 127.0.0.1 qWfkDHn
file_attachments 4 r UNASSIGNED
file_attachments 3 p STARTED 11 188.3kb 127.0.0.1 qWfkDHn
file_attachments 3 r UNASSIGNED
file_attachments 2 p STARTED 12 175.8kb 127.0.0.1 qWfkDHn
file_attachments 2 r UNASSIGNED
file_attachments 1 p STARTED 16 253.9kb 127.0.0.1 qWfkDHn
file_attachments 1 r UNASSIGNED
file_attachments 0 p STARTED 8 100.6kb 127.0.0.1 qWfkDHn
file_attachments 0 r UNASSIGNED
people 0 p STARTED 436342 10.7gb 27.0.0.1 qWfkDHn
The above output was printed during full reindex process (50% into)
People dataset consists around 110K records and it’s the most used.
Single person document may look like this:
Elastic was running for weeks with no problems at all. Suddenly search api stopped working due to ES inavailability. Elasticsearch service stopped and latest logs contained:
java.lang.OutOfMemoryError: Java heap space
I bumped assigned memory up to 6gb in /etc/default/elasticsearch
ES_JAVA_OPTS="-Xms6g -Xmx6g"
Total server memory usage now changed from 8.5g to 12.5g
Elasticsearch service seem to use around 10gb # service elasticsearch status
Unfortunately elasticsearch still shuts down from time to time with the same error. And after starting it up again people shard health shows up as RED.
Search is then unavailable
[2018-05-15T14:15:58,678][WARN ][r.suppressed ] path: /people/person/_search, params: {index=people, type=person}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
Other exceptions I saw during that time
java.util.concurrent.RejectedExecutionException: event executor terminated
org.elasticsearch.node.NodeClosedException: node closed {qWfkDHn}{qWfkDHnQR7mookwia6CnVA}{PQXCk-8sRFi-mlSlLrAQAQ}{127.0.0.1}{127.0.0.1:9300}
[2018-05-15T14:16:52,409][WARN ][o.e.i.e.Engine ] [qWfkDHn] [people][0] tried to fail engine but engine is already failed. ignoring. [failed to recover from translog]
org.elasticsearch.index.engine.EngineException: failed to recover from translog
org.elasticsearch.action.UnavailableShardsException: [people][0] primary shard is not active Timeout
[2018-05-15T14:16:52,411][WARN ][o.e.i.c.IndicesClusterStateService] [qWfkDHn] [[people][0]] marking and sending shard failed due to [shard failure, reason [index]]
java.lang.ArrayIndexOutOfBoundsException: -65536
It seems like sometimes it’s able to recover from that state, reindexing also helps.
How should I approach this problem? Is there any way to optimize my indeces or should I get more memory? I truly lack any intuition in this case and I don’t understand why it would ran out of memory.
Thank you very much!
PS. I just noticed that people index has only 1 shard. Would switching 5 shards help?