Out of memory on start with 38GB index

Thomas_Cataldo · January 14, 2015, 11:23am

Hi,

I am doing all my tests on a 38GB production index copy, with ES 1.4.2. I
tried several memory settings and virtual machine sizes, but ES fails to
start on a linux system with 48GB memory and 32GB for ES heap.

Searching for similar issues, I
encountered https://github.com/elasticsearch/elasticsearch/issues/8394
which is still open and looks fairly similar to my problem.

The debug output at the start of looks like this :

[2015-01-14 12:00:48,710][DEBUG][indices.cluster ] [Saint Elmo]
[mailspool][1] creating shard

[2015-01-14 12:00:48,710][DEBUG][index.service ] [Saint Elmo]
[mailspool] creating shard_id [1]

[2015-01-14 12:00:48,791][DEBUG][index.deletionpolicy ] [Saint Elmo]
[mailspool][1] Using [keep_only_last] deletion policy

[2015-01-14 12:00:48,793][DEBUG][index.merge.policy ] [Saint Elmo]
[mailspool][1] using [tiered] merge mergePolicy with
expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10],
max_merge_at_once_explicit[30], max_merged_segment[5gb],
segments_per_tier[10.0], reclaim_deletes_weight[2.0]

[2015-01-14 12:00:48,794][DEBUG][index.merge.scheduler ] [Saint Elmo]
[mailspool][1] using [concurrent] merge scheduler with max_thread_count[2],
max_merge_count[4]

[2015-01-14 12:00:48,797][DEBUG][index.shard.service ] [Saint Elmo]
[mailspool][1] state: [CREATED]

[2015-01-14 12:00:48,797][DEBUG][index.translog ] [Saint Elmo]
[mailspool][1] interval [5s], flush_threshold_ops [2147483647],
flush_threshold_size [200mb], flush_threshold_period [30m]

[2015-01-14 12:00:48,801][DEBUG][index.shard.service ] [Saint Elmo]
[mailspool][1] state: [CREATED]->[RECOVERING], reason [from gateway]

[2015-01-14 12:00:48,801][DEBUG][index.gateway ] [Saint Elmo]
[mailspool][1] starting recovery from local ...

[2015-01-14 12:00:48,805][DEBUG][river.cluster ] [Saint Elmo]
processing [reroute_rivers_node_changed]: execute

[2015-01-14 12:00:48,805][DEBUG][river.cluster ] [Saint Elmo]
processing [reroute_rivers_node_changed]: no change in cluster_state

[2015-01-14 12:00:48,814][INFO ][gateway ] [Saint Elmo]
recovered [1] indices into cluster_state

[2015-01-14 12:00:48,814][DEBUG][cluster.service ] [Saint Elmo]
processing [local-gateway-elected-state]: done applying updated
cluster_state (version: 2)

[2015-01-14 12:00:48,840][DEBUG][index.engine.internal ] [Saint Elmo]
[mailspool][1] starting engine

[2015-01-14 12:00:58,406][DEBUG][cluster.service ] [Saint Elmo]
processing [routing-table-updater]: execute

[2015-01-14 12:00:58,407][DEBUG][gateway.local ] [Saint Elmo]
[mailspool][4]: throttling allocation [[mailspool][4], node[null], [P],
s[UNASSIGNED]] to [[[Saint
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300]]]] on primary
allocation

[2015-01-14 12:00:58,407][DEBUG][gateway.local ] [Saint Elmo]
[mailspool][2]: throttling allocation [[mailspool][2], node[null], [P],
s[UNASSIGNED]] to [[[Saint
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300]]]] on primary
allocation

[2015-01-14 12:00:58,407][DEBUG][gateway.local ] [Saint Elmo]
[mailspool][3]: throttling allocation [[mailspool][3], node[null], [P],
s[UNASSIGNED]] to [[[Saint
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300]]]] on primary
allocation

[2015-01-14 12:00:58,408][DEBUG][gateway.local ] [Saint Elmo]
[mailspool][0]: throttling allocation [[mailspool][0], node[null], [P],
s[UNASSIGNED]] to [[[Saint
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300]]]] on primary
allocation

[2015-01-14 12:00:58,408][DEBUG][cluster.service ] [Saint Elmo]
processing [routing-table-updater]: no change in cluster_state

[2015-01-14 12:01:31,619][WARN ][index.engine.internal ] [Saint Elmo]
[mailspool][1] failed engine [refresh failed]

java.lang.OutOfMemoryError: Java heap space

at org.apache.lucene.util.FixedBitSet.(FixedBitSet.java:187)

at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)

at
org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177)

at
org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55)

at
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)

at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130)

at
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)

at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)

at
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)

at
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554)

at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:287)

at
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)

at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)

at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)

at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)

at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267)

at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257)

at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)

at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)

at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)

at
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)

at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225)

at
org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:796)

at
org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:692)

at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:798)

at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:268)

at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2015-01-14 12:01:32,238][DEBUG][index.service ] [Saint Elmo]
[mailspool] [1] closing... (reason: [engine failure, message [refresh
failed][OutOfMemoryError[Java heap space]]])

[2015-01-14 12:01:32,238][DEBUG][index.shard.service ] [Saint Elmo]
[mailspool][1] state: [RECOVERING]->[CLOSED], reason [engine failure,
message [refresh failed][OutOfMemoryError[Java heap space]]]

[2015-01-14 12:01:32,315][DEBUG][index.service ] [Saint Elmo]
[mailspool] [1] closed (reason: [engine failure, message [refresh
failed][OutOfMemoryError[Java heap space]]])

I tried adding a few settings to my elasticsearch.yml as suggested in the
referenced issue :

index.load_fixed_bitset_filters_eagerly: false

index.warmer.enabled: false
indices.breaker.total.limit: 30%

But none of this settings seems to work for me.

Our mapping is visible here
: http://git.blue-mind.net/gitlist/bluemind/blob/master/esearch/config/templates/mailspool.json

It is used to store a full text index of emails. It uses parent / child
structure :
The msgBody type contains the full text of the messages and attachments.
The msg type contains user flags (unread, important, folder it is store
into, etc).

We use this structure as "msg" is often updated : mails are often marked as
read or moved. The msgBody can be pretty big so we don't want to update the
whole document when a simple email flag is changed.

Does this kind of index structure reminds a particular bug or required
setting ? Any rule of thumb to size memory regarding to index size on disk ?

Regards,
Thomas.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/580ef748-abe9-44f6-ab4e-e388fe5b7803%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Out of memory at startup with large index and parent/child relation Elasticsearch	2	573	July 6, 2017
OutOfMemory exception after few hours of indexing Elasticsearch	6	1899	July 6, 2017
ES locks up and eat the heap Elasticsearch	18	1169	July 6, 2017
Memory problems during data index Elasticsearch	13	1563	July 6, 2017
Lost data due to out of memory error Elasticsearch	6	1430	July 6, 2017

Out of memory on start with 38GB index

Related topics