I'm trying to write a tokenizer. I installed it and ran into problems. I've uninstalled (removed) it and restarted all my nodes and can find no remains of it on my nodes. But, when I try to create a new index I get an error and my TokenizerFactory is in the stack trace.
Caused by: java.lang.IllegalStateException: [index.version.created] is not present in the index settings for index with uuid: [null]
at org.elasticsearch.Version.indexCreated(Version.java:520)
at org.elasticsearch.index.analysis.Analysis.parseAnalysisVersion(Analysis.java:99)
at org.elasticsearch.index.analysis.AbstractTokenizerFactory.(AbstractTokenizerFactory.java:40)
at org.oclc.lccn.plugin.LCCNTokenizerFactory.(LCCNTokenizerFactory.java:19)
at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
I'm not sure what commands you executed to get to this message, and what API you use
Here is my test code in Java
@Test
public void testCustomTokenizerRemoval() throws IOException {
// start node with plugin
Node node = NodeTestUtils.createNode();
Client client = node.client();
// custome tokenizer in settings
client.admin().indices().prepareCreate("demo")
.setSettings(copyToStringFromClasspath("settings.json"))
.addMapping("demo", copyToStringFromClasspath("mapping.json"))
.execute().actionGet();
String document = copyToStringFromClasspath("document.json");
// use tokenizer
client.prepareIndex("demo", "demo", "1")
.setSource(document)
.setRefresh(true).execute().actionGet();
// forgetting to delete old index with custom tokenizer
//client.admin().indices().prepareDelete("demo").execute().actionGet();
node.close();
// start a new node but without plugin
node = NodeTestUtils.createNodeWithoutPlugin();
client = node.client();
try {
// create another index without custom tokenizer
client.admin().indices().prepareCreate("demo")
.execute().actionGet();
} catch (Exception e) {
logger.warn(e.getMessage(), e);
}
NodeTestUtils.releaseNode(node);
}
The important line is the forgotten index removal.
Here is what happens in my experiment:
creating custom tokenizers are defined in the index settings at index creation time
they are recorded in the AnalysisModule which hold references for all indices (so tokenizer factories are shared between indices)
creating an index which uses the custom tokenizers
the index is not deleted
stopping node, removing plugins, and starting a node resets plugins, remove all custom tokenizers
the cluster recovers cluster state and all existing indices from disk, the indices settings are collected in the cluster state. But, unknown/invalid tokenizers are not applied (unless the translog is dirty) and remain undetected
creating a new index with a setting without custom tokenizer barks with
org.elasticsearch.indices.IndexCreationException: failed to create index
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:362) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294) [elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163) [elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:600) [elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:762) [elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) [elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) [elasticsearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
Caused by: java.lang.IllegalArgumentException: Unknown Tokenizer type [...] for [my_tokenizer]
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:267) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:61) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:233) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:105) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:143) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:159) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55) ~[elasticsearch-2.2.0.jar:2.2.0]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358) ~[elasticsearch-2.2.0.jar:2.2.0]
... 9 more
But, if you had deleted the index with the custom tokenizer before the node restart, it works successfully.
That's very helpful to know that the tokenizer reference is hiding in an index definition. It doesn't seem to help in correcting my situation.
The step I took to cause the situation were:
Install custom plugin.
Using Sense, create test index referencing the new tokenizer
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"tokenizer": "LCCNTokenizer"
}
}
}
}
which generated this error:
{
"error": {
"root_cause": [
{
"type": "settings_exception",
"reason": "Failed to get setting group for [index.analysis.analyzer.] setting prefix and setting [index.analysis.analyzer.tokenizer] because of a missing '.'"
}
],
"type": "settings_exception",
"reason": "Failed to get setting group for [index.analysis.analyzer.] setting prefix and setting [index.analysis.analyzer.tokenizer] because of a missing '.'"
},
"status": 500
}
At this point, there is no "test" index. Any attempt to delete it gets a 404. Any attempt to create a new index with no reference to the new tokenizer gets the error:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[esap03][10.192.215.62:9300][indices:admin/create]"
}
],
"type": "illegal_state_exception",
"reason": "[index.version.created] is not present in the index settings for index with uuid: [null]"
},
"status": 500
}
Online searches on that error had suggested an errant tokenizer and sure enough, there's my new tokenizer mentioned in the stacktrace.
At this point, I have no indexes, no plugins and no ability to create new indexes without seeing references to plugins that no longer exist.
Suggestions on where I go from here would be appreciated. I'd be okay with reinstalling, but I'm sure I'm not done learning how to build and install tokenizers and will probably have to repeat this exercise a couple more times.
Thanks for the correction. I'm still just learning this and expect to make more mistakes before I'm done.
That "PUT /test" was the extent of the index creation for "test". The other index creation that is now failing is one I've been using for ages:
PUT /viaf
{
"settings": {
"number_of_shards": 12,
"refresh_interval": "-1",
"number_of_replicas": 0
}
}
I just reinstalled ES 2.1.0 on top of my old installation. That zapped my es.yml file and I had to rediscover how to rebuild the cluster. While it was just 2 hosts in the cluster, everything worked fine. When I added all the hosts back into the cluster, I ran into the same error when trying to create a new index. There is clearly something lingering on one of those nodes.
Found it. Two copies of ES running on esap03. I'd wondered why that node was always the one mentioned in the error messages. Something to add to the FAQ.
Thanks for the help, Jörg! Back to trying to get my tokenizer to work.
Glad you are slowly making friend with Elasticsearch!
The default settings are a bit obscure. If a node runs, it may run into situations where it just sits there and hangs, not reacting to a kill (-INT) thus not exiting, but still continuing to listen on ports, accepting commands, and execute them. Another node can then be unintentionally started, allocating the next available ports, without a warning that another node is already there. kill -9 (KILL) as a last resort may help in this situations.
So the first thing is either to watch out for the current process list and allocated ports, ready to kill hanging nodes, or to assign a fixed port number and a fixed name to each intended node instance, so Elasticsearch moans about it when you try to start a second node instance unintentionally.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.