Custom Tokenizer won't uninstall

ralphlevan · March 10, 2016, 7:11pm

I'm trying to write a tokenizer. I installed it and ran into problems. I've uninstalled (removed) it and restarted all my nodes and can find no remains of it on my nodes. But, when I try to create a new index I get an error and my TokenizerFactory is in the stack trace.
Caused by: java.lang.IllegalStateException: [index.version.created] is not present in the index settings for index with uuid: [null]
at org.elasticsearch.Version.indexCreated(Version.java:520)
at org.elasticsearch.index.analysis.Analysis.parseAnalysisVersion(Analysis.java:99)
at org.elasticsearch.index.analysis.AbstractTokenizerFactory.(AbstractTokenizerFactory.java:40)
at org.oclc.lccn.plugin.LCCNTokenizerFactory.(LCCNTokenizerFactory.java:19)
at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)

I'm running ES 2.1.0

Any suggestions would be appreciated!

jprante · March 10, 2016, 11:59pm

I'm not sure what commands you executed to get to this message, and what API you use

Here is my test code in Java

    @Test
    public void testCustomTokenizerRemoval() throws IOException {

        // start node with plugin
        Node node = NodeTestUtils.createNode();
        Client client = node.client();

        // custome tokenizer in settings
        client.admin().indices().prepareCreate("demo")
                .setSettings(copyToStringFromClasspath("settings.json"))
                .addMapping("demo", copyToStringFromClasspath("mapping.json"))
                .execute().actionGet();
        String document = copyToStringFromClasspath("document.json");
        // use tokenizer
        client.prepareIndex("demo", "demo", "1")
                .setSource(document)
                .setRefresh(true).execute().actionGet();
        // forgetting to delete old index with custom tokenizer
        //client.admin().indices().prepareDelete("demo").execute().actionGet();
        node.close();

        // start a new node but without plugin
        node = NodeTestUtils.createNodeWithoutPlugin();
        client = node.client();
        try {
            // create another index without custom tokenizer
            client.admin().indices().prepareCreate("demo")
                    .execute().actionGet();
        } catch (Exception e) {
            logger.warn(e.getMessage(), e);
        }
        NodeTestUtils.releaseNode(node);
    }

The important line is the forgotten index removal.

Here is what happens in my experiment:

creating custom tokenizers are defined in the index settings at index creation time
they are recorded in the AnalysisModule which hold references for all indices (so tokenizer factories are shared between indices)
creating an index which uses the custom tokenizers
the index is not deleted
stopping node, removing plugins, and starting a node resets plugins, remove all custom tokenizers
the cluster recovers cluster state and all existing indices from disk, the indices settings are collected in the cluster state. But, unknown/invalid tokenizers are not applied (unless the translog is dirty) and remain undetected
creating a new index with a setting without custom tokenizer barks with

org.elasticsearch.indices.IndexCreationException: failed to create index
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:362) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294) [elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163) [elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:600) [elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:762) [elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) [elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) [elasticsearch-2.2.0.jar:2.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
Caused by: java.lang.IllegalArgumentException: Unknown Tokenizer type [...] for [my_tokenizer]
	at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:267) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:61) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:233) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:105) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:143) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:159) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55) ~[elasticsearch-2.2.0.jar:2.2.0]
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358) ~[elasticsearch-2.2.0.jar:2.2.0]
	... 9 more

But, if you had deleted the index with the custom tokenizer before the node restart, it works successfully.

ralphlevan · March 11, 2016, 3:13pm

That's very helpful to know that the tokenizer reference is hiding in an index definition. It doesn't seem to help in correcting my situation.

The step I took to cause the situation were:

Install custom plugin.
Using Sense, create test index referencing the new tokenizer
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"tokenizer": "LCCNTokenizer"
}
}
}
}

which generated this error:
{
"error": {
"root_cause": [
{
"type": "settings_exception",
"reason": "Failed to get setting group for [index.analysis.analyzer.] setting prefix and setting [index.analysis.analyzer.tokenizer] because of a missing '.'"
}
],
"type": "settings_exception",
"reason": "Failed to get setting group for [index.analysis.analyzer.] setting prefix and setting [index.analysis.analyzer.tokenizer] because of a missing '.'"
},
"status": 500
}

At this point, there is no "test" index. Any attempt to delete it gets a 404. Any attempt to create a new index with no reference to the new tokenizer gets the error:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[esap03][10.192.215.62:9300][indices:admin/create]"
}
],
"type": "illegal_state_exception",
"reason": "[index.version.created] is not present in the index settings for index with uuid: [null]"
},
"status": 500
}

Online searches on that error had suggested an errant tokenizer and sure enough, there's my new tokenizer mentioned in the stacktrace.

At this point, I have no indexes, no plugins and no ability to create new indexes without seeing references to plugins that no longer exist.

Suggestions on where I go from here would be appreciated. I'd be okay with reinstalling, but I'm sure I'm not done learning how to build and install tokenizers and will probably have to repeat this exercise a couple more times.

Thanks!

jprante · March 11, 2016, 6:23pm

The syntax is not correct, and ES reports an error. Here is a correct version

PUT /test
{
   "settings": {
      "index": {
         "analysis": {
            "analyzer": {
               "my_analyzer": {
                  "type": "custom",
                  "tokenizer": "LCCNTokenizer"
               }
            }
         }
      }
   }
}

I suspect the illegal_state_exception is also caused by wrong syntax but I can't be sure as long as I don't see the index creation command.

ralphlevan · March 11, 2016, 6:40pm

Thanks for the correction. I'm still just learning this and expect to make more mistakes before I'm done.

That "PUT /test" was the extent of the index creation for "test". The other index creation that is now failing is one I've been using for ages:
PUT /viaf
{
"settings": {
"number_of_shards": 12,
"refresh_interval": "-1",
"number_of_replicas": 0
}
}

Thanks!

Ralph

ralphlevan · March 11, 2016, 8:05pm

I just reinstalled ES 2.1.0 on top of my old installation. That zapped my es.yml file and I had to rediscover how to rebuild the cluster. While it was just 2 hosts in the cluster, everything worked fine. When I added all the hosts back into the cluster, I ran into the same error when trying to create a new index. There is clearly something lingering on one of those nodes.

ralphlevan · March 11, 2016, 8:24pm

Found it. Two copies of ES running on esap03. I'd wondered why that node was always the one mentioned in the error messages. Something to add to the FAQ.

Thanks for the help, Jörg! Back to trying to get my tokenizer to work.

jprante · March 11, 2016, 9:07pm

Glad you are slowly making friend with Elasticsearch!

The default settings are a bit obscure. If a node runs, it may run into situations where it just sits there and hangs, not reacting to a kill (-INT) thus not exiting, but still continuing to listen on ports, accepting commands, and execute them. Another node can then be unintentionally started, allocating the next available ports, without a warning that another node is already there. kill -9 (KILL) as a last resort may help in this situations.

So the first thing is either to watch out for the current process list and allocated ports, ready to kill hanging nodes, or to assign a fixed port number and a fixed name to each intended node instance, so Elasticsearch moans about it when you try to start a second node instance unintentionally.

Topic		Replies	Views
Custom tokenizer doesn't work on reindex/index api, only _analyze endpoint Elasticsearch	8	2509	October 24, 2017
Issue while creating custom analyzers for index Elasticsearch	2	817	February 26, 2018
Changes to Analyzers in 0.16.1 Elasticsearch	3	339	July 6, 2017
Custom Tokenizer Not Seeing Setting Elasticsearch	2	823	July 5, 2017
Building a custom tokenizer: "Could not find suitable constructor" Elasticsearch	16	3463	October 23, 2017

Custom Tokenizer won't uninstall

Related topics