We are using a small elasticsearch cluster of three nodes, version 1.0.1.
Each node has 7 GB RAM. Our software creates daily indexes for storing it's
data. Daily index is something around 5 GB. Unfortunately, for a reason,
Elasticsearch eats up all RAM and hangs the node, even though heap size is
set to 6 GB max. So we decided to use monit to restart it on reaching
memory limit of 90%. It works, but sometimes we got such errors:
[2014-03-22 16:56:04,943][DEBUG][action.search.type ] [es-00]
[product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]:
Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039]
org.elasticsearch.search.fetch.FetchPhaseExecutionException:
[product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery
(filtered(history.created:[1392574921000 TO
*])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[<custom:"history.created":
org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]:
Fetch Failed [Failed to fetch doc id [7263214]]
at
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230)
at
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
at
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt")
at
org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
at
org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
at
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
... 9 more
[2014-03-22 16:56:04,944][DEBUG][action.search.type ] [es-00] All
shards failed for phase: [query_fetch]
According to our logs, this might happen when one or two nodes get
restarted. More strangely, same shard got corrupted on all nodes of
cluster. Why could this happen? How can we fix it? Can you suggest us how
to fix memory usage?
We are using a small elasticsearch cluster of three nodes, version 1.0.1.
Each node has 7 GB RAM. Our software creates daily indexes for storing it's
data. Daily index is something around 5 GB. Unfortunately, for a reason,
Elasticsearch eats up all RAM and hangs the node, even though heap size is
set to 6 GB max...
If you set the heap to 6 GB and your RAM 7GB, the whole Java process needs
6GB + ~2GB = 8GB. You understand this will exceed your main memory.
50% a good rule of thumb for RAM around 4GB-16GB, because the ES process is
using a lot of filesystem buffers of OS. The OS relies on filesystem
buffers for faster I/O. If you have 7 GB, the rule of thumb calculation is:
3,5 GB for the ES heap, 2 GB for ES process buffers and internals, 1GB for
the OS kernel, and 1 GB for file system buffers. In this scenario, the OS
can work with best performance possible.
Ok, thanks, I'll try to set heap to 3,5 GB. But it is java process who eats
whole memory, up to 96% percents. Anyway, i'll try and write back if it
helps.
Do you know why index might become corrupted on both master and replica?
понедельник, 24 марта 2014 г., 13:00:34 UTC+4 пользователь Jörg Prante
написал:
If you set the heap to 6 GB and your RAM 7GB, the whole Java process needs
6GB + ~2GB = 8GB. You understand this will exceed your main memory.
50% a good rule of thumb for RAM around 4GB-16GB, because the ES process
is using a lot of filesystem buffers of OS. The OS relies on filesystem
buffers for faster I/O. If you have 7 GB, the rule of thumb calculation is:
3,5 GB for the ES heap, 2 GB for ES process buffers and internals, 1GB for
the OS kernel, and 1 GB for file system buffers. In this scenario, the OS
can work with best performance possible.
Ok, thanks, I'll try to set heap to 3,5 GB. But it is java process who
eats whole memory, up to 96% percents. Anyway, i'll try and write back if
it helps.
Do you know why index might become corrupted on both master and replica?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.