Hi all,
we recently had an issue with ES that it reported a file corruption (more
specifically a read past EOF error) after some imports/deletion for a
longer timeframe. ES reported on a few nodes a long garbage collection
time, but then was silent again until it started to show the EOF exception.
From what I could find on the internet this kind of exception can happen if
an OutOfMemory error is happening or no space on disk is left. Both did not
occur in our scenario. I don't understand how this could happen in the
first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.
[2015-02-06 01:15:11.971 GMT] INFO ||||||
elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm [3-6]
[gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total
[962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young]
[28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old]
[400.5mb]->[414.2mb]/[1.3gb]}
[2015-02-06 07:20:44.188 GMT] WARN |||||| elasticsearch[3-6][[order][3]:
Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler [3-6]
[order][3] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")
- at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)* - at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)* - at
org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)* - at
org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)* - at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)* - at
org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)* - at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
- at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)* - at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)* - at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
- at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)* - at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
- at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)* - at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)* - at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*
We ran a checkIndex and it reported that for this .fdt file and the
corresponding .tis file a read past EOF exception was discovered.
- 2 of 29: name=_dr3z docCount=575018*
- codec=Lucene3x*
- compound=false*
- numFiles=11*
- size (MB)=512.496*
- diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10,
source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40,
os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51,
java.vendor=Oracle Corporation}* - has deletions [delGen=422]*
- test: open reader.........OK*
- test: check integrity.....OK*
- test: check live docs.....OK [419388 deleted docs]*
- test: fields..............OK [132 fields]*
- test: field norms.........OK [48 fields]*
- test: terms, freq, prox...ERROR: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")*
java.io.EOFException: seek past EOF:
MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis") - at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:431)* - at
org.apache.lucene.codecs.lucene3x.SegmentTermEnum.seek(SegmentTermEnum.java:127)* - at
org.apache.lucene.codecs.lucene3x.TermInfosReaderIndex.seekEnum(TermInfosReaderIndex.java:153)* - at
org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:287)* - at
org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:232)* - at
org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.seekCeil(Lucene3xFields.java:750)* - at org.apache.lucene.index.Terms.getMax(Terms.java:182)*
- at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:795)*
- at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)*
- at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)*
- at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
- test: stored fields.......ERROR [read past EOF:
MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")]*
java.io.EOFException: read past EOF:
MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt") - at
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:104)* - at
org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)* - at
org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)* - at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)* - at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
- at
org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:1361)* - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:634)*
- at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
- test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]* - test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC;
0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]*
FAILED - WARNING: fixIndex() would remove reference to this segment; full
exception:*
java.lang.RuntimeException: Term Index test failed - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)*
- at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
One strange thing is that this segment is the only one still being on
3.6.2, rather than 4.9.1 like the others are. The .tis file was only
reported once in our logs, being not found, but this was after some "long"
time the .fdt file was complained about.
[2015-02-06 10:31:56.060] WARN
elasticsearch[blade5-2][clusterService#updateTask][T#1]
org.elasticsearch.index.store [5-2] [order][3] Can't open file to read
checksums java.io.FileNotFoundException: No such file [_dr3z.tis] at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:176)
at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
at
org.elasticsearch.index.store.DistributorDirectory.fileLength(DistributorDirectory.java:113)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:482)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:456)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:154) at
org.elasticsearch.index.store.Store.getMetadata(Store.java:143) at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:728)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
We fixed this issue by shutting down the cluster and running checkIndex on
the affected nodes, but I would like to know if there's a less invasive way
to perform this, if this issue should happen again?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d02ebec-75a2-4ef7-94dd-7034cb63af8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.