Unknown filter type [stemmer]

karejonsson · November 29, 2017, 10:28am

I have this

{
  "analysis": {
    "analyzer": {
      "sv_analyzer": {
         "type": "custom",
         "filter": ["standard", "lowercase", "sv_stop_filter", "sv_stem"],
         "tokenizer": "standard"
      }
    },
    "filter": {
      "sv_stop_filter": {
	"type": "stop",
	"stopwords": ["_swedish_"]
      },
      "sv_stem": {
        "type": "stemmer",
        "name": "swedish"
      }
    }
  }
}

In a file named "elasticsearch_combo.json" and get this

Exception in thread "main" java.lang.IllegalArgumentException: Unknown filter type [stemmer] for [sv_stem]
	at org.elasticsearch.index.analysis.AnalysisRegistry.getAnalysisProvider(AnalysisRegistry.java:389)

My code is as follows

	public static Node elasticSearchTestNode(String path_home, String clustername) throws NodeValidationException, IOException {
	    Node node = new PluginConfigurableNode(
	            Settings.builder()
	                    .put("transport.type", "netty4")
	                    .put("http.type", "netty4")
	                    .put("http.enabled", "true")
	                    .put("path.home", path_home)
	                    .put("cluster.name", clustername)
	                    //.put(getSome())
	                    .build(), 
	            Arrays.asList(Netty4Plugin.class));
	    node.start();
	    return node;
	}

private static class PluginConfigurableNode extends Node {
    public PluginConfigurableNode(Settings preparedSettings, Collection<Class<? extends Plugin>> classpathPlugins) {
        super(InternalSettingsPreparer.prepareEnvironment(preparedSettings, null), classpathPlugins);
    }
}

public static Client getClient(String clustername, String publishHost, int networkPort) throws Exception {
	Settings settings = Settings.builder()
    .put("transport.type", "netty4")
    .put("http.type", "netty4")
    .put("http.enabled", "true")
    .put("cluster.name", clustername)
    .build();

	TransportClient client = new PreBuiltTransportClient(settings)
	        .addTransportAddress(new TransportAddress(InetAddress.getByName(publishHost), networkPort));

	IndicesAdminClient indices = client.admin().indices();

	JSONParser parser = new JSONParser();
    Object adminSettings = parser.parse(new InputStreamReader(ElasticSearchTest.class.getClassLoader().getResourceAsStream("elasticsearch_combo.json")));
    indices.prepareCreate(indexName).setSettings((Map) adminSettings).execute().actionGet();

	return client;
}

public static void main(String args[]) throws Exception {
	
	Node elastcSearchNode = elasticSearchTestNode(path_home, clustername);
	
	final Client client = getClient(clustername, publishHost, networkPort);
	
            // Wrong before this

Any help is appreciated. My pom.xml is as follows

<dependency>
    <groupId>com.googlecode.json-simple</groupId>
    <artifactId>json-simple</artifactId>
    <version>1.1.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.plugin</groupId>
    <artifactId>transport-netty4-client</artifactId>
    <version>6.0.0</version>
</dependency>
<dependency> 
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.0.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.0.0</version>
</dependency>

and I also have

<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.9.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.plugin</groupId>
    <artifactId>analysis-icu</artifactId>
    <version>2.4.6</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-analysis-icu</artifactId>
    <version>2.7.0</version>
</dependency>

I have read that it is not adviced to run ES embedded but also that it works for many. What I want is a small example with swedish stemmer. If I remove sv_stem from both the filter list and the analyser it works. I have switched it for english but the exception thrown is the same. I can provide the full code for this. Another rationale is that I want one autonomous process for the indexation of my documents. The files created by ES will then be moved to a production environment. Code I developed at a previous employment used ES 1.6 and my impression is that this was simpler then.

dadoonet · November 29, 2017, 12:50pm

For sure remove analysis-icu, elasticsearch-analysis-icu

They are not compatible.

But with 6.0 you are hitting this:

karejonsson · November 29, 2017, 1:13pm

I have tried removing them entirely. Added them when I found some analysing code that tokenized with some ICU-concept. I did not get it to work so I removed all with the ambition to get something to work.

I'm mostly puzzled with how stemmer is an unknown type while the examples on the webb are so many.

dadoonet · November 29, 2017, 1:32pm

That's because of the reason I explained in the link on Github.
Those are not part anymore of the core code but a module analysis-common.

Because you try to run elasticsearch embedded and this module is not available, you don't have access to stemmer.

You need to do real integration tests agains a real elasticsearch cluster. Like I explained there:

karejonsson · November 29, 2017, 2:00pm

Thank you David. I am very greatful. It is very nice of you to provide this answer.

Here is something

Arrays.asList(new Class[] {Netty4Plugin.class, CommonAnalysisPlugin.class}));

with

		<dependency>
		    <groupId>org.codelibs.elasticsearch.module</groupId>
		    <artifactId>analysis-common</artifactId>
		    <version>6.0.0</version>
		</dependency>

karejonsson · December 11, 2017, 8:10am

Here is some more

            Arrays.asList(new Class[] { Netty4Plugin.class, CommonAnalysisPlugin.class, AnalysisICUPlugin.class,PainlessPlugin.class})

with

	<dependency>
		<groupId>org.codelibs.elasticsearch.module</groupId>
		<artifactId>analysis-common</artifactId>
		<version>6.0.0-rc2</version>
	</dependency>

karejonsson · December 11, 2017, 8:20am

Here is my general point of view. Embedded execution makes sense. The elastic team is strongly against something they should not be. The rationale with it is:

I can make projects that build with tests straight from the repo. No environment setup.
With 200000 pdf docs I can still make the indexation with only 4 G ram. This is much enough to be a relevant application.
Simple deployment/upgrade with one jar by using Spring-boot.

I am no longer waiting for answers in this thread so I'll mark it solved after this.

system · January 8, 2018, 8:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Porterstem filter failed Elasticsearch	3	406	July 6, 2017
Can't get Type in Custom Plugin Elasticsearch	2	1252	September 7, 2018
Polish analyzer Elasticsearch	9	1664	July 6, 2017
The Porter Stemming Filter Elasticsearch	3	403	July 6, 2017
ElasticSearch with stemming/snwoball Elasticsearch	9	593	July 6, 2017

Unknown filter type [stemmer]

Related topics