Create index with ngram filter fails with embedded node

ilBepi · May 31, 2018, 12:42pm

Hello,
I am creating an Elasticsearch local cluster in Java in the following way:

        Settings.Builder settingsBuilder = Settings.builder();

        settingsBuilder.put("http.enabled", "false");
        settingsBuilder.put("cluster.name", esearchConfig.getClusterName() +
            (esearchConfig.isEnableTemporaryMode() ? "-transient" : ""));
        settingsBuilder.put("node.name", esearchConfig.getNodeName() +
            (esearchConfig.isEnableTemporaryMode() ? "-transient" : ""));            
        settingsBuilder.put("node.data", "true");
        settingsBuilder.put("node.master", "true");
        settingsBuilder.put("action.auto_create_index", "false");
        settingsBuilder.put("transport.bind_host", "127.0.0.1");
        settingsBuilder.put("transport.tcp.port", 1312);
        settingsBuilder.put("http.bind_host", "127.0.0.1");
        settingsBuilder.put("http.port", 1313);

        Collection<Class<? extends Plugin>> plugins = Arrays.asList(Netty4Plugin.class);
        Node elasticSearchInternalNode = new ElasticSearchNode(settingsBuilder.build(), plugins);
        elasticSearchInternalNode.start();

Where ElasticSearchNode is a simple class that extends node and has the following constructor:

	public ElasticSearchNode(Settings settings, Collection<Class<? extends Plugin>> plugins) {
        super(InternalSettingsPreparer.prepareEnvironment(settings, null), plugins);
    }

Then I try to create an index with a nGram filter:

:curl -X PUT "localhost:1313/test04" -H 'Content-Type: application/json' -d'{
"settings" : {
    "index" : {
        "number_of_shards" : 3,
        "number_of_replicas" : 2
    }, 
	"analysis": { 
		"filter": { 
			"short_ngram_filter": { 
				"type": "nGram", 
				"min_gram": "3", 
				"max_gram": "3"
			} 
		} 
	}
}}'

But it fails:

{
"error": {
	"root_cause": [{
			"type": "illegal_argument_exception",
			"reason": "Unknown filter type [nGram] for [short_ngram_filter]"
		}
	],
	"type": "illegal_argument_exception",
	"reason": "Unknown filter type [nGram] for [short_ngram_filter]"
},
"status": 400

}

The very weird thing is that using a nGram tokenizer works perfectly, for example:

curl -X PUT "localhost:1313/test04" -H 'Content-Type: application/json' -d'{ 
"settings" : {
"index" : {
    "number_of_shards" : 3,
    "number_of_replicas" : 2
}, 
"analysis": { 
	"tokenizer": { 
		"short_ngram_tokenizer": { 
			"type": "nGram", 
			"min_gram": "3", 
			"max_gram": "3"
		} 
	} 
  }
}}'

Does anyone have any idea of what I am doing wrong?
I am using ElasticSearch 6.2.4.

Thanks very much

Giuseppe

dadoonet · May 31, 2018, 1:24pm

Starting elasticsearch embedded is not supported and this won't work.

Why do you want to start elasticsearch that way?

ilBepi · May 31, 2018, 1:31pm

Hello @dadoonet, thanks for your answer.
I know that it is not supported, but unfortunately I have to work with code that starts elasticsearch that way and I cannot change it for now
I am trying to migrate it to Elasticsearch 6.2.4.
Anyway, it seems very strange to me that creating an index with an nGram tokenizer works but with an nGram filter it does not.
I would like to know if anyone has any idea about that.
Thanks very much.

dadoonet · May 31, 2018, 1:50pm

You can't do it IMO. Or it will require a lot of manual efforts.
Because you are missing a library which is not available on maven central or in elasticsearch maven repository.

system · June 28, 2018, 1:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.