OutOfMemory exception after few hours of indexing

Hi,

We have an issue in production that the ES process getting OutOfMemory
error and stop working.
The error occur after we index about 6 million documents out of 12 million
we have in the DB. The document size is quite large and its about 70GB to 6

million documents.
We have 4 threads that each thread index 500 documents each several
seconds, in bulk index without refresh the index.
I've attached the schema of the index (AdSchema.txt).

The index have the default settings of 5 shards & 1 replica.
We have 2 nodes, each node with 64GB of RAM, WIN Server 2008, ES version
0.20.6. The ES_HEAP_SIZE is configured to 20g (I increase the memory from
10g

to 20g but it not solved the problem).
I've attached the ES log.

I've also attached the elasticsearch.yml (its the default settings with
minor changes).
I guess there is something wrong with the configuration of ES or the JVM
because we have more then enough memory.

I'll appreciate your help!
Let me know if you need any more info.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There is a problem to attached files.

index schema:
*
*

state: open
settings: {
index.analysis.filter.customngramfilter.type: nGram
index.analysis.filter.customngramfilter.max_gram: 150
index.analysis.analyzer.stringngramanalyzer.name: StringNGramAnalyzer
index.analysis.filter.customngramfilter.name: customngramfilter
index.analysis.analyzer.stringngramanalyzer.type: custom
index.analysis.analyzer.stringngramanalyzer.tokenizer: keyword
index.analysis.filter.customngramfilter.min_gram: 2
index.analysis.analyzer.sortlowercaseanalyzer.filter: lowercase
index.analysis.analyzer.sortlowercaseanalyzer.type: custom
index.analysis.analyzer.sortlowercaseanalyzer.tokenizer: keyword
index.number_of_replicas: 1
index.number_of_shards: 5
index.analysis.analyzer.sortlowercaseanalyzer.name: SortLowerCaseAnalyzer
index.analysis.analyzer.stringngramanalyzer.filter:
lowercase,customngramfilter
index.version.created: 200599
}
mappings: {
ad: {
_source: {
compress: true
}
properties: {
placementRotation: {
index: no
type: integer
}
placementCreationDate: {
format: dateOptionalTime
type: date
}
thumbnailFileWidth: {
index: no
type: integer
}
classification1: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
classification1: {
analyzer: stringngramanalyzer
type: string
}
}
}
viewCopiesLink: {
index: no
type: string
}
classification2: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
classification2: {
analyzer: stringngramanalyzer
type: string
}
}
}
relatedCopies: {
type: integer
}
classification3: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
classification3: {
analyzer: stringngramanalyzer
type: string
}
}
}
categoryID: {
type: integer
}
classification4: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
classification4: {
analyzer: stringngramanalyzer
type: string
}
}
}
videoStartMethod: {
type: integer
}
classification5: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
classification5: {
analyzer: stringngramanalyzer
type: string
}
}
}
deliveryGroupsIDs: {
type: integer
}
thumbnailAccountFileType: {
index: no
type: integer
}
agencyID: {
type: integer
}
advertiserName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
advertiserName: {
analyzer: stringngramanalyzer
type: string
}
}
}
thumbnailContentFileName: {
index: no
type: string
}
formatName: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
isCrossChannelAd: {
index: not_analyzed
type: boolean
}
placementLastImpressionTime: {
format: dateOptionalTime
type: date
}
isArchivedForMedia: {
index: not_analyzed
type: boolean
}
formatID: {
type: integer
}
placementName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
placementName: {
analyzer: stringngramanalyzer
type: string
}
}
}
placementType: {
index: no
type: integer
}
advertiserID: {
type: integer
}
dataCaptureID: {
type: integer
}
masterAdID: {
type: integer
}
classificationsAccountID: {
type: integer
}
syncUnitID: {
type: integer
}
creativeShop: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
creativeShop: {
analyzer: stringngramanalyzer
type: string
}
}
}
status: {
type: integer
}
isUsingDeliveryGroups: {
index: not_analyzed
type: boolean
}
isRejected: {
index: no
type: boolean
}
thumbnailFileHeight: {
index: no
type: integer
}
placementActualStartDate: {
index: no
format: dateOptionalTime
type: date
}
iD: {
type: integer
}
isInStreamCompanionAd: {
index: not_analyzed
type: boolean
}
size: {
type: integer
}
syncCampaignID: {
type: integer
}
adPart: {
index: no
type: string
}
adName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
adName: {
analyzer: stringngramanalyzer
type: string
}
}
}
publisherID: {
type: integer
}
campaignName: {
type: multi_field
fields: {
campaignName: {
analyzer: stringngramanalyzer
type: string
}
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
}
}
isCopyChanged: {
index: no
type: boolean
}
sectionName: {
index: no
type: string
}
placementStartDate: {
index: no
format: dateOptionalTime
type: date
}
placementID: {
type: integer
}
creatorID: {
type: integer
}
placementEndDate: {
index: no
format: dateOptionalTime
type: date
}
modeName: {
index: no
type: string
}
originalTemplateID: {
index: no
type: integer
}
newAdStatus: {
type: integer
}
syncPublisherID: {
type: integer
}
mode: {
type: integer
}
sectionID: {
type: integer
}
syncSectionID: {
type: integer
}
templateID: {
type: integer
}
advancedFeatures: {
type: integer
}
smartVersioningTypeID: {
type: integer
}
archived: {
type: boolean
}
publisherName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
publisherName: {
analyzer: stringngramanalyzer
type: string
}
}
}
dimensions: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
dimensions: {
type: string
}
}
}
sVGroupName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
sVGroupName: {
analyzer: stringngramanalyzer
type: string
}
}
}
isArchivedForAgency: {
index: not_analyzed
type: boolean
}
placementLastClickTime: {
format: dateOptionalTime
type: date
}
thumbnailContentPathID: {
index: no
type: integer
}
campaignID: {
type: integer
}
customAdFormatID: {
type: integer
}
actualImpressionsTotal: {
index: no
type: long
}
createdByAdEditor: {
index: not_analyzed
type: boolean
}
creativeTagging: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
creativeTagging: {
analyzer: stringngramanalyzer
type: string
}
}
}
categoryAccountID: {
type: integer
}
targetAudienceIDs: {
type: integer
}
screenGrabStatusName: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
agencyName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
agencyName: {
analyzer: stringngramanalyzer
type: string
}
}
}
syncUnitName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
syncUnitName: {
analyzer: stringngramanalyzer
type: string
}
}
}
newAdStatusText: {
index: no
type: string
}
targetAudienceNames: {
type: multi_field
fields: {
targetAudienceNames: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
}
}
screenGrabStatus: {
index: no
type: integer
}
creationDate: {
format: dateOptionalTime
type: date
}
accountsRolesBitmap: {
properties: {
accountID: {
type: integer
}
role: {
type: long
}
}
}
creatorName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
creatorName: {
analyzer: stringngramanalyzer
type: string
}
}
}
dataCaptureFormName: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
dataCaptureFormName: {
analyzer: stringngramanalyzer
type: string
}
}
}
guidelineID: {
type: integer
}
maxDurationInSec: {
type: integer
}
deliveryGroupsNames: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
deliveryGroupsNames: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
}
}
lastUpdatedBy: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
lastUpdatedBy: {
analyzer: stringngramanalyzer
type: string
}
}
}
thumbnailAccountID: {
index: no
type: integer
}
lastUpdatedDate: {
format: dateOptionalTime
type: date
}
notes: {
type: multi_field
fields: {
sort: {
include_in_all: false
analyzer: sortlowercaseanalyzer
type: string
}
notes: {
index: no
type: string
}
}
}
}
}
}
aliases: [ ]
}

LOG:
*
*
[2013-04-09 10:56:50,870][INFO ][org.elasticsearch.gateway] [ElserNJ01]
recovered [3] indices into cluster_state
[2013-04-09 10:56:53,091][INFO ][org.elasticsearch.cluster.service]
[ElserNJ01] added
{[ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]],},

reason: zen-disco-receive(join from
node[[ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]]])
[2013-04-09 11:04:02,372][INFO ][org.elasticsearch.cluster.metadata]
[ElserNJ01] [ad] deleting index
[2013-04-09 11:04:16,775][INFO ][org.elasticsearch.cluster.metadata]
[ElserNJ01] [ad] creating index, cause [api], shards [5]/[1], mappings [ad]
[2013-04-09 11:04:17,068][INFO ][org.elasticsearch.cluster.metadata]
[ElserNJ01] [ad] update_mapping [ad]
[2013-04-09 11:05:27,319][INFO ][org.elasticsearch.cluster.metadata]
[ElserNJ01] [ad] update_mapping [ad] (dynamic)

[2013-04-09 19:36:24,640][WARN ][org.elasticsearch.index.merge.scheduler]
[ElserNJ01] [ad][1] failed to merge
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
at org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:73)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:501)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:428)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4263)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
[2013-04-09 19:36:24,746][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][1] failed engine
java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
cannot flush
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3563)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:769)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:403)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:733)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-04-09 19:36:24,742][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][0] failed engine
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.(FreqProxTermsWriterPerField.java:194)
at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
at
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at
org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:157)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:460)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:189)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:577)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:489)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:330)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:159)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary

(TransportShardReplicationOperationAction.java:532)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run

(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-04-09 19:36:29,091][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][1],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [Failed to perform [bulk/shard] on replica, message
[RemoteTransportException[[ElserNJ01][inet[/10.24.4.194:9300]]

[bulk/shard/replica]]; nested: OutOfMemoryError[GC overhead limit
exceeded]; ]]
[2013-04-09 19:36:29,761][WARN ][org.elasticsearch.index.merge.scheduler]
[ElserNJ01] [ad][4] failed to merge
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
at org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:73)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:501)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:428)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4263)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
[2013-04-09 19:36:30,791][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][4] failed engine
java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
cannot flush
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3563)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:769)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:403)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:733)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-04-09 19:36:32,053][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][1], node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message
[IllegalStateException[this writer hit an OutOfMemoryError; cannot flush]]]
[2013-04-09 19:36:32,053][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][1],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message
[IllegalStateException[this writer hit an OutOfMemoryError; cannot flush]]]
[2013-04-09 19:36:32,184][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [engine failure, message
[IllegalStateException[this writer hit an OutOfMemoryError; cannot flush]]]
[2013-04-09 19:36:32,185][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [engine failure, message
[IllegalStateException[this writer hit an OutOfMemoryError; cannot flush]]]
[2013-04-09 19:36:32,185][WARN ][org.elasticsearch.indices.cluster]
[ElserNJ01] [ad][4] master [[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet

[/10.24.4.194:9300]]] marked shard as started, but shard have not been
created, mark shard as failed
[2013-04-09 19:36:32,185][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [master
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet[/10.24.4.194:9300]] marked shard
as started, but shard have not been created,

mark shard as failed]
[2013-04-09 19:36:32,185][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [master
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet[/10.24.4.194:9300]] marked shard
as started, but shard have not been created,

mark shard as failed]
[2013-04-09 19:36:32,192][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][0], node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [engine failure, message [OutOfMemoryError[GC
overhead limit exceeded]]]
[2013-04-09 19:36:32,192][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][0],
node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [engine failure, message [OutOfMemoryError[GC
overhead limit exceeded]]]
[2013-04-09 19:36:32,221][WARN ][org.elasticsearch.indices.cluster]
[ElserNJ01] [ad][0] master [[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet

[/10.24.4.194:9300]]] marked shard as started, but shard have not been
created, mark shard as failed
[2013-04-09 19:36:32,222][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][0], node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [master
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet[/10.24.4.194:9300]] marked shard
as started, but shard have not been created,

mark shard as failed]
[2013-04-09 19:36:32,222][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][0],
node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[STARTED], reason [master
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet[/10.24.4.194:9300]] marked shard
as started, but shard have not been created,

mark shard as failed]
[2013-04-09 20:33:23,201][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][4] failed engine
java.lang.OutOfMemoryError: Java heap space
[2013-04-09 20:33:26,208][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message [OutOfMemoryError[Java
heap space]]]
[2013-04-09 20:33:26,208][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message [OutOfMemoryError[Java
heap space]]]
[2013-04-09 20:48:53,458][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][0],
node[JBSkSWCRTTyVV6hloDSAWQ],

[P], s[STARTED], reason [engine failure, message [OutOfMemoryError[Java
heap space]]]
[2013-04-09 21:28:41,858][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][4] failed engine
java.lang.OutOfMemoryError: Java heap space
[2013-04-09 21:28:42,950][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message [OutOfMemoryError[Java
heap space]]]
[2013-04-09 21:28:42,950][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[STARTED], reason [engine failure, message [OutOfMemoryError[Java
heap space]]]
[2013-04-09 21:32:15,818][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][0],
node[JBSkSWCRTTyVV6hloDSAWQ],

[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[ad][0]: Recovery failed from
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw]

[inet[/10.24.4.194:9300]] into
[ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]]]; nested:
RemoteTransportException[[ElserNJ01][inet

[/10.24.4.194:9300]][index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[ad][0] Phase[2] Execution failed]; nested:

RemoteTransportException[[ElserNJ02][inet[/10.24.4.195:9300]][index/shard/recovery/prepareTranslog]];
nested: OutOfMemoryError[GC overhead limit

exceeded]; ]]
[2013-04-09 21:32:16,099][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][2],
node[JBSkSWCRTTyVV6hloDSAWQ],

[P], s[STARTED], reason [engine failure, message [OutOfMemoryError[GC
overhead limit exceeded]]]
[2013-04-09 21:32:16,099][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][2],
node[JBSkSWCRTTyVV6hloDSAWQ],

[P], s[STARTED], reason [master
[ElserNJ01][1Vvv1tVsRdWodp6gJGKksw][inet[/10.24.4.194:9300]] marked shard
as started, but shard have not been created,

mark shard as failed]
[2013-04-09 21:40:56,946][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][4] failed engine
java.lang.OutOfMemoryError: Java heap space
[2013-04-09 21:40:59,892][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[INITIALIZING], reason [engine failure, message
[OutOfMemoryError[Java heap space]]]
[2013-04-09 21:40:59,892][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[INITIALIZING], reason [engine failure, message
[OutOfMemoryError[Java heap space]]]
[2013-04-09 21:49:21,964][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][1],
node[JBSkSWCRTTyVV6hloDSAWQ],

[P], s[STARTED], reason [engine failure, message
[IllegalStateException[this writer hit an OutOfMemoryError; cannot flush]]]
[2013-04-09 21:50:13,980][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ01] [ad][4] failed engine
java.lang.OutOfMemoryError: Java heap space
[2013-04-09 21:50:20,677][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[INITIALIZING], reason [engine failure, message
[OutOfMemoryError[Java heap space]]]
[2013-04-09 21:50:20,677][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[R], s[INITIALIZING], reason [engine failure, message
[OutOfMemoryError[Java heap space]]]
[2013-04-09 22:00:47,973][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][0],
node[JBSkSWCRTTyVV6hloDSAWQ],

[R], s[STARTED], reason [engine failure, message [OutOfMemoryError[GC
overhead limit exceeded]]]
[2013-04-09 22:02:58,382][WARN ][org.elasticsearch.transport] [ElserNJ01]
Received response for a request that has timed out, sent [44843ms] ago,
timed

out [14804ms] ago, action [discovery/zen/fd/ping], node
[[ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]]], id [438628]
[2013-04-09 22:04:37,752][INFO ][org.elasticsearch.cluster.service]
[ElserNJ01] removed
{[ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]],},

reason:
zen-disco-node_failed([ElserNJ02][JBSkSWCRTTyVV6hloDSAWQ][inet[/10.24.4.195:9300]]),
reason failed to ping, tried [3] times, each with maximum

[30s] timeout
[2013-04-09 22:04:37,846][WARN ][org.elasticsearch.indices.cluster]
[ElserNJ01] [ad][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [ad][4]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[ad][4] failed to create engine
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:252)
at
org.elasticsearch.index.shard.service.InternalIndexShard.start(InternalIndexShard.java:279)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:182)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException: _dem.fnm
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:519)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
at org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
at
org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1161)
at org.apache.lucene.index.XIndexWriter.(XIndexWriter.java:17)
at
org.elasticsearch.index.engine.robin.RobinEngine.createWriter(RobinEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:250)
... 6 more
[2013-04-09 22:04:37,846][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] sending failed shard for [ad][4], node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[ad][4] failed recovery]; nested:

EngineCreationFailureException[[ad][4] failed to create engine]; nested:
FileNotFoundException[_dem.fnm]; ]]
[2013-04-09 22:04:37,846][WARN ][org.elasticsearch.cluster.action.shard]
[ElserNJ01] received shard failed for [ad][4],
node[1Vvv1tVsRdWodp6gJGKksw],

[P], s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[ad][4] failed recovery]; nested:

EngineCreationFailureException[[ad][4] failed to create engine]; nested:
FileNotFoundException[_dem.fnm]; ]]

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can you give information about the Java JVM you use?

Have you, beside ES, other code that runs on this JVM?

Can you exclude memory leaks in your code from being the cause?

Note, if you increase from 10 to 20 G heap, it does not simply better
perform if you just stay with the standard CMS GC. The reason is, CMS GC
was not designed to work with such large heaps. Exceptions like "GC
overhead limit exceeded" indicate this. The GC performance is getting so
poor the JVM gave up and refuses to continue. In this case, I recommend
to use a Java 7 JVM, and switch to G1 GC, which is designed for large heaps.

Jörg

Am 10.04.13 14:21, schrieb Amit Bh:

Hi,

We have an issue in production that the ES process getting OutOfMemory
error and stop working.
The error occur after we index about 6 million documents out of 12
million we have in the DB. The document size is quite large and its
about 70GB to 6

million documents.
We have 4 threads that each thread index 500 documents each several
seconds, in bulk index without refresh the index.
I've attached the schema of the index (AdSchema.txt).

The index have the default settings of 5 shards & 1 replica.
We have 2 nodes, each node with 64GB of RAM, WIN Server 2008, ES
version 0.20.6. The ES_HEAP_SIZE is configured to 20g (I increase the
memory from 10g

to 20g but it not solved the problem).
I've attached the ES log.

I've also attached the elasticsearch.yml (its the default settings
with minor changes).
I guess there is something wrong with the configuration of ES or the
JVM because we have more then enough memory.

I'll appreciate your help!
Let me know if you need any more info.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, Thank you for your help!

I'm using the latest version of Java (7).
Both servers are dedicated only for ES so no other programs are running on
those servers. The sync process is found in another server and developed
with C# and Nest client.
I configured ES to use GC1 and run the sync process again, I hope it will
solved the problem.
I'll let you know.

Thanks again for your help!
Amit.

On Wednesday, April 10, 2013 3:21:15 PM UTC+3, Amit Bh wrote:

Hi,

We have an issue in production that the ES process getting OutOfMemory
error and stop working.
The error occur after we index about 6 million documents out of 12 million
we have in the DB. The document size is quite large and its about 70GB to 6

million documents.
We have 4 threads that each thread index 500 documents each several
seconds, in bulk index without refresh the index.
I've attached the schema of the index (AdSchema.txt).

The index have the default settings of 5 shards & 1 replica.
We have 2 nodes, each node with 64GB of RAM, WIN Server 2008, ES version
0.20.6. The ES_HEAP_SIZE is configured to 20g (I increase the memory from
10g

to 20g but it not solved the problem).
I've attached the ES log.

I've also attached the elasticsearch.yml (its the default settings with
minor changes).
I guess there is something wrong with the configuration of ES or the JVM
because we have more then enough memory.

I'll appreciate your help!
Let me know if you need any more info.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I'm still getting the same error after changed the GC to GC1:

[2013-04-11 21:39:35,017][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ02] [ad][0] failed engine
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.(FreqProxTermsWriterPerField.java:193)
at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
at
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at
org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:157)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:460)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:189)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:577)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:489)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:330)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:159)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-04-11 21:39:35,547][WARN ][org.elasticsearch.index.merge.scheduler]
[ElserNJ02] [ad][0] failed to merge
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
at org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:73)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:501)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:428)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4263)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

The ES_HEAP_SIZE variable is configured to 10g.
I if decrease the Java heap size I'm getting heap size OutOfMomory error
and if I increase the heap size I'm getting errors from the GC.

Here is the elasticsearch.bin file with the new GC1 configuration:

set JAVA_OPTS=%JAVA_OPTS% -Xss512k

REM Enable aggressive optimizations in the JVM
REM - Disabled by default as it might cause the JVM to crash
REM set JAVA_OPTS=%JAVA_OPTS% -XX:+AggressiveOpts

rem set JAVA_OPTS=%JAVA_OPTS% -XX:+UseParNewGC
rem set JAVA_OPTS=%JAVA_OPTS% -XX:+UseConcMarkSweepGC

set JAVA_OPTS=%JAVA_OPTS% -XX:CMSInitiatingOccupancyFraction=75
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseCMSInitiatingOccupancyOnly

REM When running under Java 7
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseCondCardMark
set JAVA_OPTS=%JAVA_OPTS% -XX:+UnlockExperimentalVMOptions
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseG1GC
set JAVA_OPTS=%JAVA_OPTS% -XX:MaxGCPauseMillis=50
set JAVA_OPTS=%JAVA_OPTS% -XX:GCPauseIntervalMillis=100

REM GC logging options -- uncomment to enable
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCDetails
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCTimeStamps
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintClassHistogram
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintTenuringDistribution
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCApplicationStoppedTime
set JAVA_OPTS=%JAVA_OPTS% -Xloggc:/var/log/elasticsearch/gc.log

REM Causes the JVM to dump its heap on OutOfMemory.
set JAVA_OPTS=%JAVA_OPTS% -XX:+HeapDumpOnOutOfMemoryError
REM The path to the heap dump location, note directory must exists and have
enough
REM space for a full heap dump.
set JAVA_OPTS=%JAVA_OPTS% -XX:HeapDumpPath=$ES_HOME/logs/heapdump.hprof

What I'm doing wrong?
Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What JVM do you use? XX:+UnlockExperimentalVMOptions looks like you are
using Java 6. It's very challenging to use Java 6 with G1. I recommend
updating to the latest Java 7 and just use -XX:+UseG1GC

Another hint, you are using ES 0.20.6, but 0.90 (or any later version)
offers a lot of memory improvements because of Lucene 4.x

If these are no options for you, or it doesn't help, and you can't
improve your application to better handle the workload, I think you
should consider to add more nodes, since your application is exhausting
all the memory.

Jörg

Am 14.04.13 09:28, schrieb Amit Bh:

set JAVA_OPTS=%JAVA_OPTS% -XX:+UseCondCardMark
set JAVA_OPTS=%JAVA_OPTS% -XX:+UnlockExperimentalVMOptions
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseG1GC
set JAVA_OPTS=%JAVA_OPTS% -XX:MaxGCPauseMillis=50
set JAVA_OPTS=%JAVA_OPTS% -XX:GCPauseIntervalMillis=100

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.