Indexes seem corrupted


(John Chang) #1

We are worried are indexes are corrupted for a number of reasons. We are looking through the logs to see what might have happened, but are still without a grasp on it. Any advice on understanding, trouble-shooting, and preventing what we are seeing would be greatly appreciated. Thanks.

  1. We keep 4 document types; they all used to have desired mappings, now 2 of the 4 seem to be missing the mappings. Our system maps all 4 types at once and we are confident those mappings used to be there for all types.

  2. We lost a lot of documents; we do a count, and there are fraction remaining of what used to be there.

  3. We are getting an error we've never seen before (see below). The document type in question here does still seem to have the correct mappings.

[Failed to execute main query]]; nested: CompileException[[Error: Invalid shift value in prefixCoded string (is encoded value really an INT?)]\n[Near : {... Unknown ....}]\n ^\n[Line: 1, Column: 0]]; nested: NumberFormatException[Invalid shift value in prefixCoded string (is encoded value really an INT?)]; }{[6fe786b4-de13-451c-8296-7803b8bbe1d8][index0][2]: RemoteTransportException[[Angel][inet[/10.198.109.171:9300]][search/phase/query]]; nested: QueryPhaseExecutionException[[index0][2]: query[custom score (+userId:4c6b25774f8bd5147ab46cf4 +(body:"john smith" subject:"john smith" to:"john smith" from:"john smith" cc:\john smith"),function=org.elasticsearch.index.query.xcontent.CustomScoreQueryParser$ScriptScoreFunction@7daf32e3)],from[0],size[100]: Query Failed


(John Chang) #2

I should add that the index was created on Elastic Search 0.11 and we upgraded to 0.12.1, without reindexing (which we understood to be not necessary as we are not doing geo searches). We tested it after the upgrade and it seemed fine then; not sure when it went off the rails.

Not expecting this has to do with the upgrade, but just wanted to call it out just in case it was useful info.


(Clinton Gormley) #3

Hi John

On Wed, 2010-11-17 at 09:43 -0800, John Chang wrote:

I should add that the index was created on Elastic Search 0.11 and we
upgraded to 0.12.1, without reindexing (which we understood to be not
necessary as we are not doing geo searches). We tested it after the upgrade
and it seemed fine then; not sure when it went off the rails.

Not expecting this has to do with the upgrade, but just wanted to call it
out just in case it was useful info.

This does sound like your indexed have been corrupted somewhere along
the way. You may have been hit by this bug:


Although I'm not sure if that would result in you losing mappings.

Would be worth gist'ing your logs: https://gist.github.com/

clint


(John Chang) #4

Here is a gist of the elastic search logs. However, I don't know if they will useful; they just log some activity about 2 hours before I started seeing the problems noted above in my application logs, and they seem pretty tame:

Here is some more info from my application log. It is basically more of what I put in the original post:

I don't know if this is useful, but I can't think of anything more to post. Let me know if there's something else that I'm missing.


(Shay Banon) #5

It might relate to the possible corruption that might happen that was fixed
in master (upcoming 0.13). I also fixed a possible race condition between
the recovery of an index and the creation of its mappings and an index
operation getting in between the two (the new full cluster and index level
blocks). It sounds like you might have hit both of them... . I assume you
use local gateway?

-shay.banon

On Wed, Nov 17, 2010 at 10:23 PM, John Chang jchangkihtest2@gmail.comwrote:

Here is a gist of the elastic search logs. However, I don't know if they
will useful; they just log some activity about 2 hours before I started
seeing the problems noted above in my application logs, and they seem
pretty
tame:
https://gist.github.com/703964

Here is some more info from my application log. It is basically more of
what I put in the original post:
https://gist.github.com/704009

I don't know if this is useful, but I can't think of anything more to post.
Let me know if there's something else that I'm missing.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Indexes-seem-corrupted-tp1918553p1919499.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(John Chang) #6

I think that's the problem. Yes, we are using local search. Also, what you (kimchy) write makes sense, as the Elastic Search data node logs here https://gist.github.com/703964 show initialization at times that correspond perfectly to when the searches started going bad in our application log (which uses the no-data nodes).

The only thing I wonder is...why did the Elastic Search data nodes decide to reinitialize at that time; we did restart the data node cluster, but that was over 2 hours before this initialization in those logs. What kicks off the initialization other than a service restart?


(Shay Banon) #7

It seems like the network connection got completely broken between the nodes
(you see the transport disconnect reason for nodes being identified as
failed).

You can try and set: discovery.zen.fd.connect_on_network_disconnect to true,
which in such event will try and connect again to the node in question to
make sure it can't be connected.

-shay.banon

On Thu, Nov 18, 2010 at 12:29 AM, John Chang jchangkihtest2@gmail.comwrote:

I think that's the problem. Yes, we are using local search. Also, what
you
(kimchy) write makes sense, as the Elastic Search data node logs here
https://gist.github.com/703964 show initialization at times that
correspond
perfectly to when the searches started going bad in our application log
(which uses the no-data nodes).

The only thing I wonder is...why did the Elastic Search data nodes decide
to
reinitialize at that time; we did restart the data node cluster, but that
was over 2 hours before this initialization in those logs. What kicks off
the initialization other than a service restart?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Indexes-seem-corrupted-tp1918553p1920227.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #8