None of this API appears to be documented (?), so I'm going off of code samples from other plugins/tokenizers, but when I restart elastic having deployed my tokenizer I get this error constantly in the logs:
[2017-09-20 08:45:37,412][WARN ][indices.cluster ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:172)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
... 9 more
My tokenizer is built for v2.3.4, and the TokenizerFactory looks like this:
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
@Inject
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, @Assisted String name, @Assisted Settings settings){
super(index, indexSettings.getSettings(), name, settings);
}
@Override
public Tokenizer create() {
return new UrlTokenizer();
}
}
I genuinely don't know what I'm doing wrong. Have I deployed it incorrectly? It appears to be using my classes according to the logs...
I've only deployed it to one of my es nodes (4-node cluster). The /_cat/plugins?v endpoint gives this:
name component version type url
Samuel Silke urltokenizer 2.3.4.0 j
What are you the entire contents of your tokenizer .java file (in particular, the imports)? Note that Tokenizers were change in 5.0 to be "deguiced", so they should no longer have these types of pains when developing.
A shot in the dark, but are you sure that the IndexSettingsService is used
in the constructor? Try using the index settings directly (which should be
injected):
The guice stuff is all initialized in Node.java. At least, in there you can see which Module classes are loaded, which setup bindings in guice. It seems to find your class ok, but then does not find the constructor. Is the snippet you gave the entirety of your tokenizer factory class? I had first thought maybe you were using the Inject annotation from javax or something like that, which is why I asked to see the imports. But the classes in your ctor parameters seem to match up with those in other tokenizer factories.
I tried your "shot in the dark". It compiled, and I got a slightly different error this time:
Caused by: java.lang.IllegalStateException: [index.version.created] is not present in the index settings for index with uuid: [null]
at org.elasticsearch.Version.indexCreated(Version.java:584)
at org.elasticsearch.index.analysis.Analysis.parseAnalysisVersion(Analysis.java:99)
at org.elasticsearch.index.analysis.AbstractTokenizerFactory.<init>(AbstractTokenizerFactory.java:40)
Yup. It's calling my tokenizer. But now it's revealed that my tokenizer is in fact crap!
Caused by: java.lang.IndexOutOfBoundsException
at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.append(CharTermAttributeImpl.java:131)
at com.cameraforensics.elasticsearch.plugins.UrlTokenizer.incrementToken(UrlTokenizer.java:30)
Probably because - as there are no docs - I'm doing it wrong.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.