"failed to merge java.io.EOFException: read past EOF: NIOFSIndexInput("

Hi all,

I hope someone will be able to shed some light on this issue: we're
experiencing a problem affecting a single server elasticsearch server which
is being used to store and index tomcat and syslog data pushed into ES via
logstash.

The following entries are coming up in the elasticsearch log on the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.10][3] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed

and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler ] [chiana]
[logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to be zero
length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch, in spite
of these messages appearing, but eventually it falls over, after an
indeterminate length of time.
When the logstash server is unable to index into ES, it appears as though
the ES server is rejecting connections. and logstash shows "unable to index
event" messages in it's logs... then the indexes appear to be corrupt, and
at this point I've needed to stop the ES daemon, and run the lucene index
fix described here -
http://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then the
problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the size of
the document that is being pushed into ES by logstash? Is there any other
reason that we would be seeing the log entries described above?

Thanks in advance!
Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl andrewstangl@gmail.comwrote:

Hi all,

I hope someone will be able to shed some light on this issue: we're
experiencing a problem affecting a single server elasticsearch server which
is being used to store and index tomcat and syslog data pushed into ES via
logstash.

The following entries are coming up in the elasticsearch log on the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.10][3] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed

and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler ] [chiana]
[logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to be zero
length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch, in
spite of these messages appearing, but eventually it falls over, after an
indeterminate length of time.
When the logstash server is unable to index into ES, it appears as though
the ES server is rejecting connections. and logstash shows "unable to index
event" messages in it's logs... then the indexes appear to be corrupt, and
at this point I've needed to stop the ES daemon, and run the lucene index
fix described here - http://elasticsearch-users.115913.n3.nabble.com/
Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then the
problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the size of
the document that is being pushed into ES by logstash? Is there any other
reason that we would be seeing the log entries described above?

Thanks in advance!
Andrew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be
checked/repaired with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrewstangl@gmail.com
mailto:andrewstangl@gmail.com> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Alex,

Thanks for your reply .. unfortunately it appears as though that's the
first occurrence of an exception related to the issue in the ES logs; the
latest series of exceptions occurred after a few hours of silence within
the logs, after an unrelated error due to a misformed query in the kibana
interface:

[2013-07-10 15:18:27,142][DEBUG][action.search.type ] [chiana]
[logstash-2013.07.10][4], node[GRXNW9dnR1KPVejX4oCnpQ], [P], s[STARTED]:
Failed to
execute [org.elasticsearch.action.search.SearchRequest@1513f519]
org.elasticsearch.search.SearchParseException: [logstash-2013.07.10][4]:
from[-1],size[-1]: Parse Failure [Failed to parse source
[{"facets":{"chart0":
{"date_histogram":{"field":"@timestamp","interval":"5m"},"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"@type:"varni
sh" AND
@fields.uri:"confirmation"}},"filter":{"range":{"@timestamp":{"from":"2013-07-10T09:15:37.365Z","to":"2013-07-10T15:15:37.365Z"}}}}}}}}},"s
ize":0}]]

<.. snip >

[2013-07-10 18:22:49,787][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.10][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.10][2] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:787)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:403)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:731)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.10/2/index/_19c7.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:264)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:40)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:86)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:179)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:138)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:234)
at
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:138)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:704)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:654)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:142)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:36)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:451)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:768)
... 5 more

That last message is repeated continuously for what appears to be every
index.

Thanks,
Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

The server currently has more than 90% free space on the partition with the
elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante joergprante@gmail.com wrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrewstangl@gmail.com<mailto:
andrewstangl@gmail.com**>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib/elasticsearch/logstash/**

nodes/0/indices/logstash-2013.**07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115913.n3.nabble.com/Shard-**

index-gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
<mailto:elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>.

For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>

.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Andrew,

first, this is exception is rather a lucene exception - so we should
definately find out why this happens in elasticsearch, maybe we can prevent.

a) You are not accessing the same data directory with two elasticsearch
instances/processes (as this exception can happen when two lucene index
writers are accessing the data)
b) You are not running your data directory on an NFS share or some similar
network share?
c) You are not running elasticsearch with one of the problematic JVM
versions for Lucene, seen at http://wiki.apache.org/lucene-java/JavaBugs
d) You did not run out of inodes judging from your setup...
e) The most important question, can you give us some data to reproduce this?

--Alex

On Wed, Jul 10, 2013 at 10:38 PM, Andrew Stangl andrewstangl@gmail.comwrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante joergprante@gmail.comwrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrewstangl@gmail.com<mailto:
andrewstangl@gmail.com**>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib/elasticsearch/logstash/**

nodes/0/indices/logstash-2013.**07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115913.n3.nabble.com/Shard-**

index-gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
<mailto:elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>.

For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>

.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I enabled index.shard.check_on_startup: true, and the log shows that it's
now checking the indices/shards on startup ... but the problem persists.
Strangely, although there are "Failed to perform scheduled engine refresh"
messages constantly in the ES log, the logstash implementation still
appears to be functioning this morning, after running throughout the night,
and the logs continue to be indexed and are view-able in the kibana
interface.
I'm going to upgrade the ES and logstash packages, since we're on 0.20.6
and 1.1.9 respectively, and perhaps the newer 0.90 implementation will
resolve this issue.

Thanks,
Andrew

On Wednesday, July 10, 2013 9:38:32 PM UTC+1, Andrew Stangl wrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante joergprante@gmail.comwrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrewstangl@gmail.com<mailto:
andrewstangl@gmail.com**>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib/elasticsearch/logstash/**

nodes/0/indices/logstash-2013.**07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115913.n3.nabble.com/Shard-**

index-gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the 

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
<mailto:elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>.

For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>

.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

upgrading is always worth a try. Please keep the google group informed, if
this solved your issue. As a side note, I hope you removed all the indices
which showed that exception (or at least stopped to try to write into
them), as these are corrupted most likely from a lucene point of view.

Thanks!

--Alex

On Thu, Jul 11, 2013 at 9:01 AM, Andrew Stangl andrewstangl@gmail.comwrote:

Hi,

I enabled index.shard.check_on_**startup: true, and the log shows that
it's now checking the indices/shards on startup ... but the problem
persists.
Strangely, although there are "Failed to perform scheduled engine
refresh" messages constantly in the ES log, the logstash implementation
still appears to be functioning this morning, after running throughout the
night, and the logs continue to be indexed and are view-able in the kibana
interface.
I'm going to upgrade the ES and logstash packages, since we're on 0.20.6
and 1.1.9 respectively, and perhaps the newer 0.90 implementation will
resolve this issue.

Thanks,
Andrew

On Wednesday, July 10, 2013 9:38:32 PM UTC+1, Andrew Stangl wrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_**startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante joergprante@gmail.comwrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrewstangl@gmail.com<mailto:
andrewstangl@gmail.com****>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine**.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib**/elasticsearch/logstash/**nodes/*

*0/indices/logstash-2013.**07.10/**2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115**913.n3.nabble.com/Shard-**index-*

*gone-bad-anyone-know-**how-to-**fix-this-java-io-EOFException-
read-past-EOF-**NIOFSIndexInput-****tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>.

For more options, visit https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/elasticsearch/**2Yhn5FUZAKM/**unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Alex,

Thanks for your response, replies inline:

first, this is exception is rather a lucene exception - so we should

definately find out why this happens in elasticsearch, maybe we can prevent.

Would an upgrade to elasticsearch 0.90.x possibly resolve the issue, since
it uses a newer version of lucene?
We found a good set of munin plugins to try get some more visibility over
what's going on into our graphs, but some of the functionality it's looking
for isn't available in the version of elasticsearch we're using (0.20.6)
... our customer has requested the newer version, so we'll be installing
this anyway, just wondering if this will potentially resolve the issue.

a) You are not accessing the same data directory with two elasticsearch

instances/processes (as this exception can happen when two lucene index
writers are accessing the data)

There are two nodes connection to the index, but one is the elasticsearch
client built in to logstash, and it's not being detected as a data node,
only a client.

b) You are not running your data directory on an NFS share or some similar
network share?

No, this is running on a SATA RAID - not exactly sure how many disks, but
currently showing a size of 1.8Tb

c) You are not running elasticsearch with one of the problematic JVM
versions for Lucene, seen at http://wiki.apache.org/lucene-java/JavaBugs

Not as far as I can tell, the current JVM on the box:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

For what it's worth, we usually run the same stack on openjdk 6, but our
client uses that version on the rest of their tin (tomcat app)

d) You did not run out of inodes judging from your setup...

Again, not 100% sure, but it doesn't appear to be; is there another
elasticsearch query I can do to confirm this?

e) The most important question, can you give us some data to reproduce
this?

This would be problematic - I'd have to confirm with the client whether
they would be willing to allow this; should they agree, what's the easiest
way to share this? A tarball of the current data directory perhaps? it's
quite large at the moment :-/

Cheers!
Andrew

On Wed, Jul 10, 2013 at 10:38 PM, Andrew Stangl <andrew...@gmail.com<javascript:>

wrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrew...@gmail.com<javascript:><mailto:
andrew...@gmail.com <javascript:>**>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib/elasticsearch/logstash/**

nodes/0/indices/logstash-2013.**07.10/2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115913.n3.nabble.com/Shard-**

index-gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the 

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com <javascript:>
<mailto:elasticsearch%**2Bunsubscribe@googlegroups.com<javascript:>
**>.

For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>

.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Awesome, thanks very much - I'll attempt the upgrade and report back here :slight_smile:

I've been consistently fixing the indices, but it's a maintenance overhead
at the moment, and not sustainable .. let's hope the upgrade resolves the
issue

Cheers!
Andrew

On Thursday, July 11, 2013 8:09:08 AM UTC+1, Alexander Reelsen wrote:

Hey,

upgrading is always worth a try. Please keep the google group informed, if
this solved your issue. As a side note, I hope you removed all the indices
which showed that exception (or at least stopped to try to write into
them), as these are corrupted most likely from a lucene point of view.

Thanks!

--Alex

On Thu, Jul 11, 2013 at 9:01 AM, Andrew Stangl <andrew...@gmail.com<javascript:>

wrote:

Hi,

I enabled index.shard.check_on_**startup: true, and the log shows that
it's now checking the indices/shards on startup ... but the problem
persists.
Strangely, although there are "Failed to perform scheduled engine
refresh" messages constantly in the ES log, the logstash implementation
still appears to be functioning this morning, after running throughout the
night, and the logs continue to be indexed and are view-able in the kibana
interface.
I'm going to upgrade the ES and logstash packages, since we're on 0.20.6
and 1.1.9 respectively, and perhaps the newer 0.90 implementation will
resolve this issue.

Thanks,
Andrew

On Wednesday, July 10, 2013 9:38:32 PM UTC+1, Andrew Stangl wrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_**startup:
true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before this
happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrew...@gmail.com<javascript:><mailto:
andrew...@gmail.com <javascript:>****>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine**.**RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib**/elasticsearch/logstash/**nodes/

**0/indices/logstash-2013.**07.10/**2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115**913.n3.nabble.com/Shard-**index-

**gone-bad-anyone-know-**how-to-**fix-this-java-io-EOFException-
read-past-EOF-**NIOFSIndexInput-****tp4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to the 

Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com<javascript:>
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com<javascript:>
**>.

For more options, visit https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/*to
*pic/elasticsearch/**2Yhn5FUZAKM/**unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm afraid this is still happening, even after upgrading both elasticsearch
and logstash to the latest version:

elasticsearch: 0.90.2
logstash: 1.1.13

most recent stack trace from the elasticsearch logs:

[2013-07-12 14:20:57,078][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.12][3] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.12][3] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:796)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:412)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:755)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:181)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:126)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:102)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(ReadersAndLiveDocs.java:218)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:377)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:275)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:250)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:240)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
at
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:155)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:204)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:777)
... 5 more
[2013-07-12 14:20:58,144][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.12][3] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.12][3] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:796)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:412)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:755)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:181)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:126)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:102)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(ReadersAndLiveDocs.java:218)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:377)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:275)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:250)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:240)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
at
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:155)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:204)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:777)
... 5 more
[2013-07-12 14:21:00,155][WARN ][index.shard.service ] [chiana]
[logstash-2013.07.12][3] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[logstash-2013.07.12][3] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:796)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:412)
at
org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:755)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:181)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:126)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:102)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(ReadersAndLiveDocs.java:218)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:377)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:275)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:250)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:240)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
at
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
at
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:155)
at
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:204)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:777)
... 5 more
[2013-07-12 14:21:09,925][WARN ][index.merge.scheduler ] [chiana]
[logstash-2013.07.12][3] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:181)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:126)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:102)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getMergeReader(ReadersAndLiveDocs.java:153)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3700)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3370)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
[2013-07-12 14:21:09,926][WARN ][index.engine.robin ] [chiana]
[logstash-2013.07.12][3] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.io.EOFException:
read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider$CustomConcurrentMergeScheduler.handleMergeException(ConcurrentMergeSchedulerProvider.java:100)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:266)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:181)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:126)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:102)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getMergeReader(ReadersAndLiveDocs.java:153)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3700)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3370)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
[2013-07-12 14:21:10,015][WARN ][cluster.action.shard ] [chiana]
sending failed shard for [logstash-2013.07.12][3],
node[NLPrKhOdQ1GIoNMCN1IyHg], [P], s[STARTED], reason [engine failure,
message [MergeException[java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")];
nested: EOFException[read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")];
]]
[2013-07-12 14:21:10,016][WARN ][cluster.action.shard ] [chiana]
received shard failed for [logstash-2013.07.12][3],
node[NLPrKhOdQ1GIoNMCN1IyHg], [P], s[STARTED], reason [engine failure,
message [MergeException[java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")];
nested: EOFException[read past EOF:
NIOFSIndexInput(path="/var/lib/elasticsearch/logstash/nodes/0/indices/logstash-2013.07.12/3/index/_14b.fdx")];
]]

Not sure what else to look at here ...

After this happens, logs stop being filtered into logstash, and the health
check shows:

{
"cluster_name" : "logstash",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1
}

Any suggestions/help would be greatly appreciated..

Cheers,
Andrew

On Thu, Jul 11, 2013 at 8:16 AM, Andrew Stangl andrewstangl@gmail.comwrote:

Awesome, thanks very much - I'll attempt the upgrade and report back here
:slight_smile:

I've been consistently fixing the indices, but it's a maintenance overhead
at the moment, and not sustainable .. let's hope the upgrade resolves the
issue

Cheers!
Andrew

On Thursday, July 11, 2013 8:09:08 AM UTC+1, Alexander Reelsen wrote:

Hey,

upgrading is always worth a try. Please keep the google group informed,
if this solved your issue. As a side note, I hope you removed all the
indices which showed that exception (or at least stopped to try to write
into them), as these are corrupted most likely from a lucene point of view.

Thanks!

--Alex

On Thu, Jul 11, 2013 at 9:01 AM, Andrew Stangl andrew...@gmail.comwrote:

Hi,

I enabled index.shard.check_on_startup: true, and the log shows
that it's now checking the indices/shards on startup ... but the problem
persists.
Strangely, although there are "Failed to perform scheduled engine
refresh" messages constantly in the ES log, the logstash implementation
still appears to be functioning this morning, after running throughout the
night, and the logs continue to be indexed and are view-able in the kibana
interface.
I'm going to upgrade the ES and logstash packages, since we're on 0.20.6
and 1.1.9 respectively, and perhaps the newer 0.90 implementation will
resolve this issue.

Thanks,
Andrew

On Wednesday, July 10, 2013 9:38:32 PM UTC+1, Andrew Stangl wrote:

Hi Jörg,

The server currently has more than 90% free space on the partition with
the elasticsearch data store; this is from a completely fresh index created
automatically by logstash. We did originally experience disk space issues,
but subsequently added a very large volume, and started from fresh.

I'm now going to attempt to start the node with index.shard.check_on_**
star**tup: true, will let you know how it goes.

Thanks,
Andrew

On Wed, Jul 10, 2013 at 5:46 PM, Jörg Prante joerg...@gmail.comwrote:

It looks like there was temporarily enough disk space while the Lucene
index was written. If so, the index is corrupt and must be checked/repaired
with index.shard.check_on_startup setting on node startup.

Jörg

Am 10.07.13 17:55, schrieb Alexander Reelsen:

Hey,

is it possible that there is an exception in your logfiles before
this happens, which can shed some more light on this issue? Maybe you are
running out of file descriptors (wildly speculating here) or and
OutOfMemoryException happened or something...

--Alex

On Wed, Jul 10, 2013 at 5:29 PM, Andrew Stangl <andrew...@gmail.com<mailto:
andrew...@gmail.com******>> wrote:

Hi all,

I hope someone will be able to shed some light on this issue:
we're experiencing a problem affecting a single server
elasticsearch server which is being used to store and index tomcat
and syslog data pushed into ES via logstash.

The following entries are coming up in the elasticsearch log on
the server:

[2013-07-10 06:50:31,699][WARN ][index.shard.service      ]
[chiana] [logstash-2013.07.10][3] Failed to perform scheduled
engine refresh
org.elasticsearch.index.**engine****.**

RefreshFailedEngineException:
[logstash-2013.07.10][3] Refresh failed
and

[2013-07-10 06:50:34,376][WARN ][index.merge.scheduler    ]
[chiana] [logstash-2013.07.10][2] failed to merge
java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/var/**lib****/elasticsearch/logstash/**

nodes/****0/indices/logstash-2013.**07.10/****2/index/_egi.fnm")

the files in the "failed to merge" paths indicated all appear to
be zero length, not sure whether this is significant.

The logstash server will continue to feed logs into Elasticsearch,
in spite of these messages appearing, but eventually it falls
over, after an indeterminate length of time.
When the logstash server is unable to index into ES, it appears as
though the ES server is rejecting connections. and logstash shows
"unable to index event" messages in it's logs... then the indexes
appear to be corrupt, and at this point I've needed to stop the ES
daemon, and run the lucene index fix described here -
http://elasticsearch-users.**115****913.n3.nabble.com/Shard-**

index-****gone-bad-anyone-know-how-to-fix-this-java-io-
EOFException-read-past-EOF-**NIOFSIndexInput-**tp
4027683p4028934.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Shard-index-gone-bad-anyone-know-how-to-fix-this-java-io-EOFException-read-past-EOF-NIOFSIndexInput-tp4027683p4028934.html
Once I restart the ES daemon, all seems okay for a while .. then
the problem starts happening again :-/

Is it possible that we're reaching some sort of limitation on the
size of the document that is being pushed into ES by logstash? Is
there any other reason that we would be seeing the log entries
described above?

Thanks in advance!
Andrew
--     You received this message because you are subscribed to

the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**goog****legroups.com
mailto:elasticsearch%**2Bunsubs****cribe@googlegroups.com**.

For more options, visit https://groups.google.com/**grou****

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**goog****legroups.com.

For more options, visit https://groups.google.com/grou**
ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**to
****pic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**goog
legroups.com.

For more options, visit https://groups.google.com/**grou****ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You should try to either repair the shard by check_index value "fix" or,
if that does not help, delete it and let ES recover the shard from the
replica shard, in case you use replica level. But if you have switched
off replica, your data may be lost.

Jörg

Am 12.07.13 16:29, schrieb Andrew Stangl:

I'm afraid this is still happening, even after upgrading both
elasticsearch and logstash to the latest version:

elasticsearch: 0.90.2
logstash: 1.1.13

m

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

Thanks for your reply; this issue keeps on re-occurring even with a
completely fresh install, with no index data - I suspect that somehow
malformed logs being parsed through successfully by logstash, and are
causing this issue.. is that a possibility? I've been repeatedly running
the lucene fix tool directly on the indexes after they're corrupt, and it
fixes the problem, but only temporarily. Since disabling a subset of the
logs being parsed by the system, the issue appears to have gone away - but
I'm not entirely sure, only time will tell.

Thanks,
Andrew

On Fri, Jul 12, 2013 at 6:20 PM, Jörg Prante joergprante@gmail.com wrote:

You should try to either repair the shard by check_index value "fix" or,
if that does not help, delete it and let ES recover the shard from the
replica shard, in case you use replica level. But if you have switched off
replica, your data may be lost.

Jörg

Am 12.07.13 16:29, schrieb Andrew Stangl:

I'm afraid this is still happening, even after upgrading both

elasticsearch and logstash to the latest version:

elasticsearch: 0.90.2
logstash: 1.1.13

m

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is very fascinating.

If you could save the logstash data subset and create somehow
instructions for a setup to replay it, this would greatly help the ES
engineers to be able to verify if there is really a hidden bug in
ES/Lucene or if other mysterious things are going on.

Jörg

Am 12.07.13 19:38, schrieb Andrew Stangl:

I've been repeatedly running the lucene fix tool directly on the
indexes after they're corrupt, and it fixes the problem, but only
temporarily. Since disabling a subset of the logs being parsed by the
system, the issue appears to have gone away - but I'm not entirely
sure, only time will tell.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'll see if this is possible - I've discussed it briefly with my client,
but they've mentioned that they have sensitive data in some of the logs
that are being stored - we're going to investigate disabling the nodes
which are forwarding/logging the sensitive data, and try and reproduce the
error without those. I should then be able to create a tarball of the
failed indices for the developers to examine.

This will likely happen in the early part of next week - I'll keep you
updated .. thanks for your help :slight_smile:

Cheers,
Andrew

On Fri, Jul 12, 2013 at 6:57 PM, Jörg Prante joergprante@gmail.com wrote:

This is very fascinating.

If you could save the logstash data subset and create somehow instructions
for a setup to replay it, this would greatly help the ES engineers to be
able to verify if there is really a hidden bug in ES/Lucene or if other
mysterious things are going on.

Jörg

Am 12.07.13 19:38, schrieb Andrew Stangl:

I've been repeatedly running the lucene fix tool directly on the indexes

after they're corrupt, and it fixes the problem, but only temporarily.
Since disabling a subset of the logs being parsed by the system, the issue
appears to have gone away - but I'm not entirely sure, only time will tell.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've now discovered that the RAIDed disks in the server are in a degraded
state - it's a mirrored pair, and the one disk is showing as failed ... is
it at all possible that this could be the cause of the corrupt indices?
Bare in mind that the indices continue to get corrupted on the server, each
time I wipe out the data store completely, and start over with a completely
empty store, Logstash and Elasticsearch will run for a short space of time
(sometimes only 5 to 10 minutes) before failing and showing the
"java.io.EOFException: read past EOF: NIOFSIndexInput(path=" messages in
the logs

I'm going to have our customer replace the failed disk and restore the
RAID, and I'm hoping this will resolve the problem.. but not 100% sure this
will resolve the issue.

Thanks,
Andrew

On Fri, Jul 12, 2013 at 7:33 PM, Andrew Stangl andrewstangl@gmail.comwrote:

I'll see if this is possible - I've discussed it briefly with my client,
but they've mentioned that they have sensitive data in some of the logs
that are being stored - we're going to investigate disabling the nodes
which are forwarding/logging the sensitive data, and try and reproduce the
error without those. I should then be able to create a tarball of the
failed indices for the developers to examine.

This will likely happen in the early part of next week - I'll keep you
updated .. thanks for your help :slight_smile:

Cheers,
Andrew

On Fri, Jul 12, 2013 at 6:57 PM, Jörg Prante joergprante@gmail.comwrote:

This is very fascinating.

If you could save the logstash data subset and create somehow
instructions for a setup to replay it, this would greatly help the ES
engineers to be able to verify if there is really a hidden bug in ES/Lucene
or if other mysterious things are going on.

Jörg

Am 12.07.13 19:38, schrieb Andrew Stangl:

I've been repeatedly running the lucene fix tool directly on the indexes

after they're corrupt, and it fixes the problem, but only temporarily.
Since disabling a subset of the logs being parsed by the system, the issue
appears to have gone away - but I'm not entirely sure, only time will tell.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**2Yhn5FUZAKM/unsubscribehttps://groups.google.com/d/topic/elasticsearch/2Yhn5FUZAKM/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On 14 July 2013 08:11, Andrew Stangl andrewstangl@gmail.com wrote:

I've now discovered that the RAIDed disks in the server are in a degraded
state - it's a mirrored pair, and the one disk is showing as failed ... is
it at all possible that this could be the cause of the corrupt indices?
Bare in mind that the indices continue to get corrupted on the server, each
time I wipe out the data store completely, and start over with a completely
empty store, Logstash and Elasticsearch will run for a short space of time
(sometimes only 5 to 10 minutes) before failing and showing the
"java.io.EOFException: read past EOF: NIOFSIndexInput(path=" messages in
the logs

I'm going to have our customer replace the failed disk and restore the
RAID, and I'm hoping this will resolve the problem.. but not 100% sure this
will resolve the issue.

This was my first thought when I read your mails above, but I didn't
mention it because I thought you would already have checked it :slight_smile:

Given that (a) the index files are corrupted on the filesystem (b) it
happens repeatedly on new indices and (c) nobody else is seeing this, the
changes are very good that fixing the RAID will fix this problem.

Also, make sure that you are using the same version of Java on all nodes
(including Java client nodes, if you have them)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.