Indexing issues


(morus.walter.ml) #1

Hi,

I'm currently trying to build an elasticsearch index
and experience some trouble.

Indexing is based on database data and basically has three
steps:
a) index all database content
b) index incremental changes that happened during step a
(until almost all are done)
c) permanently index incremental changes

b and c are different as b is part of the generation of a new
index, c is permanent index maintainance.

For two of three indices this works fine.
For the third, which is the biggest and most complicated
(using child documents), step a works fine but after a handful
of updates in step b elasticsearch crashes the index and it
becomes unusable. The same happens if I run step c leaving out b.

My indexer then dies from a http timeout.

My first thought was, that there might be issues in the incremental
indexers (b and/or c).
However if I run the same indexer against a small partial version
of that index, everything works fine.

The size of the full index is ~ 12 Mio documents and 7.2 GB.
I also tried a smaller index having just 3 Mio documents and
1.4 GB size, no luck.

There are some indexing operations where I - perhaps naivly - assumed
elasticsearch would take of the difficulties:

  • it is possible that documents are deleted, that do not exist
    seems to work fine, I get a 'not found' in these cases
  • it is possbile that child documents are indexed where the parent
    does not exist
    I do not see errors in that situation. I did not check if the
    documents are created, I would be fine with either rejecting them
    or adding them. Index corruption is not so great though.
  • it is possible that child documents are deleted where no document
    with that parent id exists, both having or not having child documents
    with that parent

I tried to minimize these cases without effect on the crashes.
I cannot fully avoid them without searching first, which I so far
wanted to avoid.
But the same conditions can occur in the case of incremental indexing
on top of the small partial index (having some 30k documents)
and I see no problems there.

I was mostly using ES 1.0.1 but finally tried 1.0.2 as well,
I started with two instances (on two different
servers) and one replica but reduced that to one instance and no
replica in order to take out complications from replication.
This did not have any effect. So I added the 2nd instance again and
now have two instances no replica. The index has 6 shards, three on
each instance.
Each instance has 8 GB of memory and 64k filescriptors configured.
The machines have 16 GB of memory.

The OS is linux, ubuntu 10.4. JVM is java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode).

The indexers are written in ruby using the elasticsearch gem.

The index corruption shows up as some shards in the state UNASSIGNED.
I had some luck with indexes beeing fixed on server restart but
in other cases (one replica, no replica but only one instance)
the failure seemed to be unfixable.

See below for the initial error messages.
I do not see any errors in the response messages for the indexing
and deletion request.
In the past the es server went into a state producing huge amounts
(>1 GB) of error messages, in my latest tests (with smaller indices)
this did not happen (there is difference in the number of replica as
well).

Where else could I look, to understand why the shard is failing?
Any explanations or at least guesses what might go wrong?

best
Morus

PS:
The initial error messages look like
[2014-04-10 15:58:14,463][WARN ][index.merge.scheduler ]
[pjpp-production mas ter] [candidates_v0004][5] failed to merge
org.apache.lucene.store.AlreadyClosedException: this IndexReader is
closed at
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:252) at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.ja
va:102) at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.ja
va:56) at
org.apache.lucene.index.IndexReader.leaves(IndexReader.java:502) at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.cont
ains(DeleteByQueryWrappingFilter.java:122) at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.getD
ocIdSet(DeleteByQueryWrappingFilter.java:81) at
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDoc
IdSet(ApplyAcceptedDocsFilter.java:45) at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(Con
stantScoreQuery.java:142) at
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.fil
teredScorer(FilteredQuery.java:533) at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:13 3)
at
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFi
lter.java:59) at
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(Buffe
redUpdatesStream.java:546) at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates( BufferedUpdatesStream.java:284)
at
org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3844)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3806)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3659) at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMe
rgeScheduler.java:405) at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(Trac
kingConcurrentMergeScheduler.java:107) at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:482) [2014-04-10 15:58:14,464][WARN
][index.engine.internal ] [pjpp-production mas ter]
[candidates_v0004][5] failed engine
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.store.Alre adyClosedException: this IndexReader is
closed at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvi
der$CustomConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler
Provider.java:109) at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:518) Caused by:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is
closed at
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:252) at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:102)
at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:56)
at org.apache.lucene.index.IndexReader.leaves(IndexReader.java:502) at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.contains(DeleteByQueryWrappingFilter.java:122)
at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.getDocIdSet(DeleteByQueryWrappingFilter.java:81)
at
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:45)
at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:142)
at
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:533)
at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:133)
at
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)
at
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:546)
at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:284)
at
org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3844)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3806)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3659) at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140410170122.24d09521%40tucholsky.experteer.muc.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #2

Hey,

can you create a github issue for this (and add as much info as possible)?
And maybe try with elasticsearch 1.1 as well? Are you changes involving
deletes as well?

Thanks!

--Alex

On Thu, Apr 10, 2014 at 5:01 PM, Morus Walter <
morus.walter.ml@googlemail.com> wrote:

Hi,

I'm currently trying to build an elasticsearch index
and experience some trouble.

Indexing is based on database data and basically has three
steps:
a) index all database content
b) index incremental changes that happened during step a
(until almost all are done)
c) permanently index incremental changes

b and c are different as b is part of the generation of a new
index, c is permanent index maintainance.

For two of three indices this works fine.
For the third, which is the biggest and most complicated
(using child documents), step a works fine but after a handful
of updates in step b elasticsearch crashes the index and it
becomes unusable. The same happens if I run step c leaving out b.

My indexer then dies from a http timeout.

My first thought was, that there might be issues in the incremental
indexers (b and/or c).
However if I run the same indexer against a small partial version
of that index, everything works fine.

The size of the full index is ~ 12 Mio documents and 7.2 GB.
I also tried a smaller index having just 3 Mio documents and
1.4 GB size, no luck.

There are some indexing operations where I - perhaps naivly - assumed
elasticsearch would take of the difficulties:

  • it is possible that documents are deleted, that do not exist
    seems to work fine, I get a 'not found' in these cases
  • it is possbile that child documents are indexed where the parent
    does not exist
    I do not see errors in that situation. I did not check if the
    documents are created, I would be fine with either rejecting them
    or adding them. Index corruption is not so great though.
  • it is possible that child documents are deleted where no document
    with that parent id exists, both having or not having child documents
    with that parent

I tried to minimize these cases without effect on the crashes.
I cannot fully avoid them without searching first, which I so far
wanted to avoid.
But the same conditions can occur in the case of incremental indexing
on top of the small partial index (having some 30k documents)
and I see no problems there.

I was mostly using ES 1.0.1 but finally tried 1.0.2 as well,
I started with two instances (on two different
servers) and one replica but reduced that to one instance and no
replica in order to take out complications from replication.
This did not have any effect. So I added the 2nd instance again and
now have two instances no replica. The index has 6 shards, three on
each instance.
Each instance has 8 GB of memory and 64k filescriptors configured.
The machines have 16 GB of memory.

The OS is linux, ubuntu 10.4. JVM is java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode).

The indexers are written in ruby using the elasticsearch gem.

The index corruption shows up as some shards in the state UNASSIGNED.
I had some luck with indexes beeing fixed on server restart but
in other cases (one replica, no replica but only one instance)
the failure seemed to be unfixable.

See below for the initial error messages.
I do not see any errors in the response messages for the indexing
and deletion request.
In the past the es server went into a state producing huge amounts
(>1 GB) of error messages, in my latest tests (with smaller indices)
this did not happen (there is difference in the number of replica as
well).

Where else could I look, to understand why the shard is failing?
Any explanations or at least guesses what might go wrong?

best
Morus

PS:
The initial error messages look like
[2014-04-10 15:58:14,463][WARN ][index.merge.scheduler ]
[pjpp-production mas ter] [candidates_v0004][5] failed to merge
org.apache.lucene.store.AlreadyClosedException: this IndexReader is
closed at
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:252) at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.ja
va:102) at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.ja
va:56) at
org.apache.lucene.index.IndexReader.leaves(IndexReader.java:502) at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.cont
ains(DeleteByQueryWrappingFilter.java:122) at
org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.getD
ocIdSet(DeleteByQueryWrappingFilter.java:81) at
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDoc
IdSet(ApplyAcceptedDocsFilter.java:45) at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(Con
stantScoreQuery.java:142) at
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.fil
teredScorer(FilteredQuery.java:533) at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:13 3)
at
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFi
lter.java:59) at
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(Buffe
redUpdatesStream.java:546) at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(
BufferedUpdatesStream.java:284)
at
org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3844)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3806)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3659) at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMe
rgeScheduler.java:405) at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(Trac
kingConcurrentMergeScheduler.java:107) at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:482) [2014-04-10 15:58:14,464][WARN
][index.engine.internal ] [pjpp-production mas ter]
[candidates_v0004][5] failed engine
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.store.Alre adyClosedException: this IndexReader is
closed at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvi

der$CustomConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler
Provider.java:109) at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:518) Caused by:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is
closed at
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:252) at

org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:102)
at
org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:56)
at org.apache.lucene.index.IndexReader.leaves(IndexReader.java:502) at

org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.contains(DeleteByQueryWrappingFilter.java:122)
at

org.elasticsearch.index.search.child.DeleteByQueryWrappingFilter.getDocIdSet(DeleteByQueryWrappingFilter.java:81)
at

org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:45)
at

org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:142)
at

org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:533)
at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:133)
at

org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)
at

org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:546)
at

org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:284)
at
org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3844)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3806)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3659) at

org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at

org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
at

org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/20140410170122.24d09521%40tucholsky.experteer.muc
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9FCYssPQvoX2gJ2U%2By6%3DOmuHrprT1_8WrqOom-fPhF%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(morus.walter.ml) #3

Hallo Alexander,

thanks for your answer.

can you create a github issue for this (and add as much info as possible)?
And maybe try with elasticsearch 1.1 as well? Are you changes involving
deletes as well?

yes there are deletes as well.

I'll continue my examinations and will create the github issue when I'm
done.

There are some indexing operations where I - perhaps naivly - assumed
elasticsearch would take of the difficulties:

  • it is possible that documents are deleted, that do not exist
    seems to work fine, I get a 'not found' in these cases
  • it is possbile that child documents are indexed where the parent
    does not exist
    I do not see errors in that situation. I did not check if the
    documents are created, I would be fine with either rejecting them
    or adding them. Index corruption is not so great though.
  • it is possible that child documents are deleted where no document
    with that parent id exists, both having or not having child documents
    with that parent

Any comment on these operations?

grüße
Morus

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140414103420.755296e6%40tucholsky.experteer.muc.
For more options, visit https://groups.google.com/d/optout.


(system) #4