'TokenStream contract violation' error after update to 0.90.10

Hello folks,

I've just updgraded to 0.90.10 from 0.90.3.

When indexing I'm now getting the following message:

[2014-01-13 15:34:37,116][WARN ][cluster.action.shard ] [Kogar] [
corpusv5][4] sending failed shard for [corpusv5][4], node[
dRuzryJqQ7eDsSj68YjFHg], [R], s[INITIALIZING], indexUUID [na], reason [
Failed to start shard, message [RecoveryFailedException[[corpusv5][4]:
Recovery failed from [Force][QXLJK6rkQb-u7DtzaN0V_A][inet[/10.33.159.105:
9300]] into [Kogar][dRuzryJqQ7eDsSj68YjFHg][inet[/10.33.159.105:9301]]];nested
: RemoteTransportException[[Force][inet[/10.33.159.105:9300]][index/shard/
recovery/startRecovery]]; nested: RecoveryEngineException[[corpusv5][4]
Phase[2] Execution failed]; nested: RemoteTransportException[[Kogar][inet[/
10.33.159.105:9301]][index/shard/recovery/translogOps]]; nested:
IndexFailedEngineException[[corpusv5][4] Index failed for [bikmo#60246]];
nested: IllegalStateException[TokenStream contract violation:
reset()/close() call missing, reset() called multiple times, or subclass
does not call super.reset(). Please see Javadocs of TokenStream class for
more information about the correct consuming workflow.]; ]]

This doesn't occur on every document being indexed and I'm not sure what is
causing it.

Does anybody have any ideas?

Many thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f13391ff-3961-4202-8c0f-74a579affadd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This error is caused by an analyzer in your mapping which does no longer
conform to the Lucene TokenStream API contract, which has become stricter
in latest release 4.6.

Probably you use a plugin with a custom analyzer?

You must fix the analyzer code (i.e. call super.reset()) as it says in the
error message.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHY9ZNz1-tEit8KTJsQMZje%2B0rcKXGSez7F0wNDd0tJ6g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

This error is caused by an analyzer in your mapping which does no longer
conform to the Lucene TokenStream API contract, which has become stricter
in latest release 4.6.

Ah, thank you for the pointer. I'll investigate further in that direction.

Probably you use a plugin with a custom analyzer?

As far as I'm aware, only standard analyzers are being used. We've got
head, bigdesk, and hq plugins but that's it.

The index has the following settings, of which I can't really see anything
out of the ordinary:
{
"settings":{
"analysis":{
"filter": {
"camel_caps" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"(\p{Ll}+|\p{Lu}\p{Ll}+|\p{Lu}+)",
"(\d+)"
]
},
"simple_stemmer": {
"type": "stemmer",
"name": "minimal_english"
},
"word_mash": {
"type": "shingle",
"token_separator": ""
},
"word_dash": {
"type": "pattern_replace",
"pattern": "[^0-9a-z]+",
"replacement": "-"
},
"partial_middle":{
"type":"nGram",
"max_gram":50,
"min_gram":2
}
},
"analyzer":{
"search":{
"filter":[
"trim",
"asciifolding",
"camel_caps",
"lowercase",
"simple_stemmer",
"word_mash"
],
"type":"custom",
"tokenizer":"standard"
},
"index": {
"filter":[
"trim",
"asciifolding",
"lowercase",
"word_dash"
],
"type":"custom",
"tokenizer":"keyword"
},
"suggest": {
"filter": [
"trim",
"asciifolding",
"lowercase",
"partial_middle"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}

Hmmm...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/30fa5606-1ab6-4430-a21e-e553d9e9a47e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm not sure, but the
org.apache.lucene.analysis.pattern.PatternReplaceFilter source code of
Lucene 4.6 looks weird, it may be causing the error.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHQpV2ohbHQd-N1hFeXoy2%3Do0-QncNzTv%2BSWMRrScgKzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Maybe a simplistic approach but:

I un-installed and re-installed ES and voila: errors gone.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/374381ce-27c6-44b8-a9d9-c3c6daddbf69%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I see, maybe you used some old Lucene jars by accident.

Jörg

On Mon, Jan 13, 2014 at 6:00 PM, Jorj Ives jorj@bikmo.com wrote:

Maybe a simplistic approach but:

I un-installed and re-installed ES and voila: errors gone.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHJTiar1%2Buz1BKe0EyvQ1Af4%2BYO9H1Nv6YBvA-zhm2SvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

In your case, StandardTokenizer was the bad guy. If you had an older
version in your classpath (lucene-analyzers-common-4.5.x or whatever), this
one may have been used instead the correct one of 4.6. In 4.6, together
with the other contract improvements, a missing super.reset() was added to
StandardTokenizer, missing in older versions: [Apache-SVN] Diff of /lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java

Uwe (who did the changes with Robert Muir)

Am Montag, 13. Januar 2014 19:55:53 UTC+1 schrieb Jörg Prante:

I see, maybe you used some old Lucene jars by accident.

Jörg

On Mon, Jan 13, 2014 at 6:00 PM, Jorj Ives <jo...@bikmo.com <javascript:>>wrote:

Maybe a simplistic approach but:

I un-installed and re-installed ES and voila: errors gone.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d4a9b695-e509-401a-a6e4-4ae16ed91e37%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.