Read past EOF exception on .tis and .fdt file

Hi all,

we recently had an issue with ES that it reported a file corruption (more
specifically a read past EOF error) after some imports/deletion for a
longer timeframe. ES reported on a few nodes a long garbage collection
time, but then was silent again until it started to show the EOF exception.
From what I could find on the internet this kind of exception can happen if
an OutOfMemory error is happening or no space on disk is left. Both did not
occur in our scenario. I don't understand how this could happen in the
first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.

[2015-02-06 01:15:11.971 GMT] INFO ||||||
elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm [3-6]
[gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total
[962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young]
[28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old]
[400.5mb]->[414.2mb]/[1.3gb]}

[2015-02-06 07:20:44.188 GMT] WARN |||||| elasticsearch[3-6][[order][3]:
Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler [3-6]
[order][3] failed to merge

java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")

  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)*
  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at
    org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)*
  • at
    org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)*
  • at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
  • at
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)*
  • at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)*
  • at
    org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*

We ran a checkIndex and it reported that for this .fdt file and the
corresponding .tis file a read past EOF exception was discovered.

  • 2 of 29: name=_dr3z docCount=575018*
  • codec=Lucene3x*
  • compound=false*
  • numFiles=11*
  • size (MB)=512.496*
  • diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10,
    source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40,
    os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51,
    java.vendor=Oracle Corporation}*
  • has deletions [delGen=422]*
  • test: open reader.........OK*
  • test: check integrity.....OK*
  • test: check live docs.....OK [419388 deleted docs]*
  • test: fields..............OK [132 fields]*
  • test: field norms.........OK [48 fields]*
  • test: terms, freq, prox...ERROR: java.io.EOFException: seek past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")*
    java.io.EOFException: seek past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")
  • at
    org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:431)*
  • at
    org.apache.lucene.codecs.lucene3x.SegmentTermEnum.seek(SegmentTermEnum.java:127)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReaderIndex.seekEnum(TermInfosReaderIndex.java:153)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:287)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:232)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.seekCeil(Lucene3xFields.java:750)*
  • at org.apache.lucene.index.Terms.getMax(Terms.java:182)*
  • at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:795)*
  • at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: stored fields.......ERROR [read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")]*
    java.io.EOFException: read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")
  • at
    org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:104)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:1361)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:634)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: term vectors........OK [0 total vector count; avg 0 term/freq
    vector fields per doc]*
  • test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC;
    0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]*
    FAILED
  • WARNING: fixIndex() would remove reference to this segment; full
    exception:*
    java.lang.RuntimeException: Term Index test failed
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*

One strange thing is that this segment is the only one still being on
3.6.2, rather than 4.9.1 like the others are. The .tis file was only
reported once in our logs, being not found, but this was after some "long"
time the .fdt file was complained about.

[2015-02-06 10:31:56.060] WARN
elasticsearch[blade5-2][clusterService#updateTask][T#1]
org.elasticsearch.index.store [5-2] [order][3] Can't open file to read
checksums java.io.FileNotFoundException: No such file [_dr3z.tis] at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:176)
at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
at
org.elasticsearch.index.store.DistributorDirectory.fileLength(DistributorDirectory.java:113)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:482)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:456)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:154) at
org.elasticsearch.index.store.Store.getMetadata(Store.java:143) at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:728)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

We fixed this issue by shutting down the cluster and running checkIndex on
the affected nodes, but I would like to know if there's a less invasive way
to perform this, if this issue should happen again?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d02ebec-75a2-4ef7-94dd-7034cb63af8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

plus 1 for a less invasive way to recover data

I had a similar issue today on one of our test servers where I eventually
managed to recover my index by running CheckIndex on one of my shards. In
my case, I also had to remove the translog recovery file to actually get
the cluster green. This is one of those steps that seems to be omitted in
most mentions of the CheckIndex tool in combination with ElasticSearch.

Anyway, after this, I ran CheckIndex on some other shards that were
supposedly fine and was a bit surprised when it actually reported and fixed
some errors there too.

This makes me wonder if there should be a proper API around this tool in
elasticsearch that allows you to run proper corruption checks on the whole
cluster and fix problems. It would be nice if you could run some
diagnostics to confirm your data is actually 100% OK. I know elasticsearch
has increasingly more checks that run on startup involving checksums, etc.
But it also seems those checks failed to detect problems that CheckIndex
seems to think need fixing. That sounds like something most admins would
like to know about their cluster.

On Thursday, February 12, 2015 at 10:44:26 AM UTC+1, Philipp Knobel wrote:

Hi all,

we recently had an issue with ES that it reported a file corruption (more
specifically a read past EOF error) after some imports/deletion for a
longer timeframe. ES reported on a few nodes a long garbage collection
time, but then was silent again until it started to show the EOF exception.
From what I could find on the internet this kind of exception can happen if
an OutOfMemory error is happening or no space on disk is left. Both did not
occur in our scenario. I don't understand how this could happen in the
first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.

[2015-02-06 01:15:11.971 GMT] INFO ||||||
elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm [3-6]
[gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total
[962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young]
[28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old]
[400.5mb]->[414.2mb]/[1.3gb]}

[2015-02-06 07:20:44.188 GMT] WARN |||||| elasticsearch[3-6][[order][3]:
Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler [3-6]
[order][3] failed to merge

java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")

  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)*
  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at
    org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)*
  • at
    org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)*
  • at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
  • at
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)*
  • at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)*
  • at
    org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*

We ran a checkIndex and it reported that for this .fdt file and the
corresponding .tis file a read past EOF exception was discovered.

  • 2 of 29: name=_dr3z docCount=575018*
  • codec=Lucene3x*
  • compound=false*
  • numFiles=11*
  • size (MB)=512.496*
  • diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10,
    source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40,
    os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51,
    java.vendor=Oracle Corporation}*
  • has deletions [delGen=422]*
  • test: open reader.........OK*
  • test: check integrity.....OK*
  • test: check live docs.....OK [419388 deleted docs]*
  • test: fields..............OK [132 fields]*
  • test: field norms.........OK [48 fields]*
  • test: terms, freq, prox...ERROR: java.io.EOFException: seek past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")*
    java.io.EOFException: seek past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")
  • at
    org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:431)*
  • at
    org.apache.lucene.codecs.lucene3x.SegmentTermEnum.seek(SegmentTermEnum.java:127)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReaderIndex.seekEnum(TermInfosReaderIndex.java:153)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:287)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:232)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.seekCeil(Lucene3xFields.java:750)*
  • at org.apache.lucene.index.Terms.getMax(Terms.java:182)*
  • at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:795)*
  • at
    org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: stored fields.......ERROR [read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")]*
    java.io.EOFException: read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")
  • at
    org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:104)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:1361)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:634)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: term vectors........OK [0 total vector count; avg 0 term/freq
    vector fields per doc]*
  • test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0
    NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]*
    FAILED
  • WARNING: fixIndex() would remove reference to this segment; full
    exception:*
    java.lang.RuntimeException: Term Index test failed
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*

One strange thing is that this segment is the only one still being on
3.6.2, rather than 4.9.1 like the others are. The .tis file was only
reported once in our logs, being not found, but this was after some "long"
time the .fdt file was complained about.

[2015-02-06 10:31:56.060] WARN
elasticsearch[blade5-2][clusterService#updateTask][T#1]
org.elasticsearch.index.store [5-2] [order][3] Can't open file to read
checksums java.io.FileNotFoundException: No such file [_dr3z.tis] at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:176)
at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
at
org.elasticsearch.index.store.DistributorDirectory.fileLength(DistributorDirectory.java:113)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:482)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:456)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:154) at
org.elasticsearch.index.store.Store.getMetadata(Store.java:143) at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:728)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

We fixed this issue by shutting down the cluster and running checkIndex on
the affected nodes, but I would like to know if there's a less invasive way
to perform this, if this issue should happen again?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/09ad0f24-96c5-4a06-b7b4-faf66ff439f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES has the index.shard.check_on_startup to run CheckIndex on startup of a
shard:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

Mike McCandless

http://blog.mikemccandless.com

On Wed, Feb 18, 2015 at 1:17 PM, Jilles van Gurp jillesvangurp@gmail.com
wrote:

plus 1 for a less invasive way to recover data

I had a similar issue today on one of our test servers where I eventually
managed to recover my index by running CheckIndex on one of my shards. In
my case, I also had to remove the translog recovery file to actually get
the cluster green. This is one of those steps that seems to be omitted in
most mentions of the CheckIndex tool in combination with ElasticSearch.

Anyway, after this, I ran CheckIndex on some other shards that were
supposedly fine and was a bit surprised when it actually reported and fixed
some errors there too.

This makes me wonder if there should be a proper API around this tool in
elasticsearch that allows you to run proper corruption checks on the whole
cluster and fix problems. It would be nice if you could run some
diagnostics to confirm your data is actually 100% OK. I know elasticsearch
has increasingly more checks that run on startup involving checksums, etc.
But it also seems those checks failed to detect problems that CheckIndex
seems to think need fixing. That sounds like something most admins would
like to know about their cluster.

On Thursday, February 12, 2015 at 10:44:26 AM UTC+1, Philipp Knobel wrote:

Hi all,

we recently had an issue with ES that it reported a file corruption (more
specifically a read past EOF error) after some imports/deletion for a
longer timeframe. ES reported on a few nodes a long garbage collection
time, but then was silent again until it started to show the EOF exception.
From what I could find on the internet this kind of exception can happen if
an OutOfMemory error is happening or no space on disk is left. Both did not
occur in our scenario. I don't understand how this could happen in the
first place. We're running ES 1.3.4 and the migrated a while ago from 0.20.

[2015-02-06 01:15:11.971 GMT] INFO ||||||
elasticsearch[3-6][scheduler][T#1] org.elasticsearch.monitor.jvm [3-6]
[gc][young][618719][105280] duration [962ms], collections [1]/[1.6s], total
[962ms]/[16.8m], memory [435.2mb]->[425.9mb]/[1.9gb], all_pools {[young]
[28.2mb]->[5.3mb]/[546.1mb]}{[survivor] [6.3mb]->[6.3mb]/[68.2mb]}{[old]
[400.5mb]->[414.2mb]/[1.3gb]}

[2015-02-06 07:20:44.188 GMT] WARN |||||| elasticsearch[3-6][[order][3]:
Lucene Merge Thread #17] org.elasticsearch.index.merge.scheduler [3-6]
[order][3] failed to merge

java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")

  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144)*
  • at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at
    org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:389)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:355)*
  • at
    org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)*
  • at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)*
  • at
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4225)*
  • at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3820)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)*
  • at
    org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)*
  • at
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)*

We ran a checkIndex and it reported that for this .fdt file and the
corresponding .tis file a read past EOF exception was discovered.

  • 2 of 29: name=_dr3z docCount=575018*
  • codec=Lucene3x*
  • compound=false*
  • numFiles=11*
  • size (MB)=512.496*
  • diagnostics = {os=Linux, os.version=3.1.6, mergeFactor=10,
    source=merge, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40,
    os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51,
    java.vendor=Oracle Corporation}*
  • has deletions [delGen=422]*
  • test: open reader.........OK*
  • test: check integrity.....OK*
  • test: check live docs.....OK [419388 deleted docs]*
  • test: fields..............OK [132 fields]*
  • test: field norms.........OK [48 fields]*
  • test: terms, freq, prox...ERROR: java.io.EOFException: seek past
    EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")*
    java.io.EOFException: seek past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.tis")
  • at
    org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:431)*
  • at
    org.apache.lucene.codecs.lucene3x.SegmentTermEnum.seek(SegmentTermEnum.java:127)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReaderIndex.seekEnum(TermInfosReaderIndex.java:153)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:287)*
  • at
    org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:232)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.seekCeil(Lucene3xFields.java:750)*
  • at org.apache.lucene.index.Terms.getMax(Terms.java:182)*
  • at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:795)*
  • at
    org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: stored fields.......ERROR [read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")]*
    java.io.EOFException: read past EOF:
    MMapIndexInput(path="/data/cluster1/nodes/0/indices/order/3/index/_dr3z.fdt")
  • at
    org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:104)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.readField(Lucene3xStoredFieldsReader.java:273)*
  • at
    org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.visitDocument(Lucene3xStoredFieldsReader.java:240)*
  • at
    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:341)*
  • at org.apache.lucene.index.IndexReader.document(IndexReader.java:460)*
  • at
    org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:1361)*
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:634)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*
  • test: term vectors........OK [0 total vector count; avg 0 term/freq
    vector fields per doc]*
  • test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0
    NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]*
    FAILED
  • WARNING: fixIndex() would remove reference to this segment; full
    exception:*
    java.lang.RuntimeException: Term Index test failed
  • at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)*
  • at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)*

One strange thing is that this segment is the only one still being on
3.6.2, rather than 4.9.1 like the others are. The .tis file was only
reported once in our logs, being not found, but this was after some "long"
time the .fdt file was complained about.

[2015-02-06 10:31:56.060] WARN
elasticsearch[blade5-2][clusterService#updateTask][T#1]
org.elasticsearch.index.store [5-2] [order][3] Can't open file to read
checksums java.io.FileNotFoundException: No such file [_dr3z.tis] at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:176)
at
org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
at
org.elasticsearch.index.store.DistributorDirectory.fileLength(DistributorDirectory.java:113)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:482)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:456)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:154) at
org.elasticsearch.index.store.Store.getMetadata(Store.java:143) at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:728)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

We fixed this issue by shutting down the cluster and running checkIndex
on the affected nodes, but I would like to know if there's a less invasive
way to perform this, if this issue should happen again?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/09ad0f24-96c5-4a06-b7b4-faf66ff439f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/09ad0f24-96c5-4a06-b7b4-faf66ff439f1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRfnf_Ej6dfE9T%2Bhkmkz0w2DckWTvK9i-a%2BOw6aoBKL8OQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.