"Failed to merge" - java.io.FileNotFoundException

Hi guys,

I've been indexing millions of docs in a ES cluster with 4 processes
that lanches about 10 threads each, and each one of those threads use
a Transport client for indexing.

For one day everything was fine, but today we started to experience
lots of slow queries to ES and found the following output in the logs
of almost every server (there are 6 ES servers):

[2012-01-12 00:20:38,510][WARN ][index.merge.scheduler ] [Phage]
[items][14] failed to merge
java.io.FileNotFoundException: _b7i6_1.del
at org.elasticsearch.index.store.Store
$StoreDirectory.fileLength(Store.java:378)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:
303)
at org.apache.lucene.index.MergePolicy
$OneMerge.totalBytesSize(MergePolicy.java:174)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:
79)
at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:456)

I've 'gisted' the cluster state here https://gist.github.com/1602222

What could be the reasons for this warning?
The cluster health shows everything ok so, I guess no information has
been lost, but since these are production servers, I'd rather know if
this can be a problem as soon as possible.

Thanks in advance!

Frederic

Did someone delete some files from the nodes? Were there any failures
before it started to happen? Which version are you using?

On Thu, Jan 12, 2012 at 8:33 PM, Frederic focampo.br@gmail.com wrote:

Hi guys,

I've been indexing millions of docs in a ES cluster with 4 processes
that lanches about 10 threads each, and each one of those threads use
a Transport client for indexing.

For one day everything was fine, but today we started to experience
lots of slow queries to ES and found the following output in the logs
of almost every server (there are 6 ES servers):

[2012-01-12 00:20:38,510][WARN ][index.merge.scheduler ] [Phage]
[items][14] failed to merge
java.io.FileNotFoundException: _b7i6_1.del
at org.elasticsearch.index.store.Store
$StoreDirectory.fileLength(Store.java:378)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:
303)
at org.apache.lucene.index.MergePolicy
$OneMerge.totalBytesSize(MergePolicy.java:174)
at

org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:
79)
at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:456)

I've 'gisted' the cluster state here ES Stats and Health · GitHub

What could be the reasons for this warning?
The cluster health shows everything ok so, I guess no information has
been lost, but since these are production servers, I'd rather know if
this can be a problem as soon as possible.

Thanks in advance!

Frederic

Thanks for you reply and sorry for the missing info:

  • I'm using ES 0.18.5
  • AFAIK nobody should have deleted any file from those servers
  • In some servers, only the mentioned output has been logged so far
    (last night it was logged another similar warning, but with a
    different file name java.io.FileNotFoundException: _cqjh_2.del'').
    In some other servers there are only DEBUG stacktraces like the
    following

[2012-01-10 15:08:59,178][DEBUG][action.admin.cluster.node.info]
[James Howlett]
failed to execute on node [pfh3QO6ATo2A4wcEk2Cq0g]
org.elasticsearch.transport.RemoteTransportException:
[Gog][inet[/172.16.138.113:9300]][/cluster/nodes/info/node]
Caused by: java.lang.NullPointerException
at org.elasticsearch.http.HttpInfo.writeTo(HttpInfo.java:65)

I assumed this as expected as I've removed a server from the cluster,
whose IP was configured in the 'discovery.zen.ping.unicast.hosts'
list.

I'm planning to restart each one of the ES instances because some
other issues (too many open idle connections), could that possibly
recreate the missing files?

Thanks a lot,

On 12 ene, 18:52, Shay Banon kim...@gmail.com wrote:

Did someone delete some files from the nodes? Were there any failures
before it started to happen? Which version are you using?

On Thu, Jan 12, 2012 at 8:33 PM, Frederic focampo...@gmail.com wrote:

Hi guys,

I've been indexing millions of docs in a ES cluster with 4 processes
that lanches about 10 threads each, and each one of those threads use
a Transport client for indexing.

For one day everything was fine, but today we started to experience
lots of slow queries to ES and found the following output in the logs
of almost every server (there are 6 ES servers):

[2012-01-12 00:20:38,510][WARN ][index.merge.scheduler ] [Phage]
[items][14] failed to merge
java.io.FileNotFoundException: _b7i6_1.del
at org.elasticsearch.index.store.Store
$StoreDirectory.fileLength(Store.java:378)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:
303)
at org.apache.lucene.index.MergePolicy
$OneMerge.totalBytesSize(MergePolicy.java:174)
at

org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:
79)
at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:456)

I've 'gisted' the cluster state herehttps://gist.github.com/1602222

What could be the reasons for this warning?
The cluster health shows everything ok so, I guess no information has
been lost, but since these are production servers, I'd rather know if
this can be a problem as soon as possible.

Thanks in advance!

Frederic

Were you able to resolve your issue? if yes, how?
im facing a similar issue and my queries are getting really slow. not sure
how to fix it.

thanks

On Thursday, January 12, 2012 10:33:05 AM UTC-8, Frederic wrote:

Hi guys,

I've been indexing millions of docs in a ES cluster with 4 processes
that lanches about 10 threads each, and each one of those threads use
a Transport client for indexing.

For one day everything was fine, but today we started to experience
lots of slow queries to ES and found the following output in the logs
of almost every server (there are 6 ES servers):

[2012-01-12 00:20:38,510][WARN ][index.merge.scheduler ] [Phage]
[items][14] failed to merge
java.io.FileNotFoundException: _b7i6_1.del
at org.elasticsearch.index.store.Store
$StoreDirectory.fileLength(Store.java:378)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:
303)
at org.apache.lucene.index.MergePolicy
$OneMerge.totalBytesSize(MergePolicy.java:174)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:

at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:456)

I've 'gisted' the cluster state here ES Stats and Health · GitHub

What could be the reasons for this warning?
The cluster health shows everything ok so, I guess no information has
been lost, but since these are production servers, I'd rather know if
this can be a problem as soon as possible.

Thanks in advance!

Frederic