Unknown filter type [stemmer]

I have this

{
  "analysis": {
    "analyzer": {
      "sv_analyzer": {
         "type": "custom",
         "filter": ["standard", "lowercase", "sv_stop_filter", "sv_stem"],
         "tokenizer": "standard"
      }
    },
    "filter": {
      "sv_stop_filter": {
	"type": "stop",
	"stopwords": ["_swedish_"]
      },
      "sv_stem": {
        "type": "stemmer",
        "name": "swedish"
      }
    }
  }
}

In a file named "elasticsearch_combo.json" and get this

Exception in thread "main" java.lang.IllegalArgumentException: Unknown filter type [stemmer] for [sv_stem]
	at org.elasticsearch.index.analysis.AnalysisRegistry.getAnalysisProvider(AnalysisRegistry.java:389)

My code is as follows

	public static Node elasticSearchTestNode(String path_home, String clustername) throws NodeValidationException, IOException {
	    Node node = new PluginConfigurableNode(
	            Settings.builder()
	                    .put("transport.type", "netty4")
	                    .put("http.type", "netty4")
	                    .put("http.enabled", "true")
	                    .put("path.home", path_home)
	                    .put("cluster.name", clustername)
	                    //.put(getSome())
	                    .build(), 
	            Arrays.asList(Netty4Plugin.class));
	    node.start();
	    return node;
	}

private static class PluginConfigurableNode extends Node {
    public PluginConfigurableNode(Settings preparedSettings, Collection<Class<? extends Plugin>> classpathPlugins) {
        super(InternalSettingsPreparer.prepareEnvironment(preparedSettings, null), classpathPlugins);
    }
}

public static Client getClient(String clustername, String publishHost, int networkPort) throws Exception {
	Settings settings = Settings.builder()
    .put("transport.type", "netty4")
    .put("http.type", "netty4")
    .put("http.enabled", "true")
    .put("cluster.name", clustername)
    .build();

	TransportClient client = new PreBuiltTransportClient(settings)
	        .addTransportAddress(new TransportAddress(InetAddress.getByName(publishHost), networkPort));

	IndicesAdminClient indices = client.admin().indices();

	JSONParser parser = new JSONParser();
    Object adminSettings = parser.parse(new InputStreamReader(ElasticSearchTest.class.getClassLoader().getResourceAsStream("elasticsearch_combo.json")));
    indices.prepareCreate(indexName).setSettings((Map) adminSettings).execute().actionGet();

	return client;
}

public static void main(String args[]) throws Exception {
	
	Node elastcSearchNode = elasticSearchTestNode(path_home, clustername);
	
	final Client client = getClient(clustername, publishHost, networkPort);
	
            // Wrong before this

Any help is appreciated. My pom.xml is as follows

<dependency>
    <groupId>com.googlecode.json-simple</groupId>
    <artifactId>json-simple</artifactId>
    <version>1.1.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.plugin</groupId>
    <artifactId>transport-netty4-client</artifactId>
    <version>6.0.0</version>
</dependency>
<dependency> 
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.0.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.0.0</version>
</dependency>

and I also have

<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.9.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.plugin</groupId>
    <artifactId>analysis-icu</artifactId>
    <version>2.4.6</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-analysis-icu</artifactId>
    <version>2.7.0</version>
</dependency>

I have read that it is not adviced to run ES embedded but also that it works for many. What I want is a small example with swedish stemmer. If I remove sv_stem from both the filter list and the analyser it works. I have switched it for english but the exception thrown is the same. I can provide the full code for this. Another rationale is that I want one autonomous process for the indexation of my documents. The files created by ES will then be moved to a production environment. Code I developed at a previous employment used ES 1.6 and my impression is that this was simpler then.

For sure remove analysis-icu, elasticsearch-analysis-icu

They are not compatible.

But with 6.0 you are hitting this:

I have tried removing them entirely. Added them when I found some analysing code that tokenized with some ICU-concept. I did not get it to work so I removed all with the ambition to get something to work.

I'm mostly puzzled with how stemmer is an unknown type while the examples on the webb are so many.

That's because of the reason I explained in the link on Github.
Those are not part anymore of the core code but a module analysis-common.

Because you try to run elasticsearch embedded and this module is not available, you don't have access to stemmer.

You need to do real integration tests agains a real elasticsearch cluster. Like I explained there:

Thank you David. I am very greatful. It is very nice of you to provide this answer.

Here is something

Arrays.asList(new Class[] {Netty4Plugin.class, CommonAnalysisPlugin.class}));

with

		<dependency>
		    <groupId>org.codelibs.elasticsearch.module</groupId>
		    <artifactId>analysis-common</artifactId>
		    <version>6.0.0</version>
		</dependency>
1 Like

Here is some more

            Arrays.asList(new Class[] { Netty4Plugin.class, CommonAnalysisPlugin.class, AnalysisICUPlugin.class,PainlessPlugin.class})

with

	<dependency>
		<groupId>org.codelibs.elasticsearch.module</groupId>
		<artifactId>analysis-common</artifactId>
		<version>6.0.0-rc2</version>
	</dependency>
1 Like

Here is my general point of view. Embedded execution makes sense. The elastic team is strongly against something they should not be. The rationale with it is:

  • I can make projects that build with tests straight from the repo. No environment setup.
  • With 200000 pdf docs I can still make the indexation with only 4 G ram. This is much enough to be a relevant application.
  • Simple deployment/upgrade with one jar by using Spring-boot.

I am no longer waiting for answers in this thread so I'll mark it solved after this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.