How to set the anlyzer?

My lucene app (which I am converting to ElasticSearch) uses org.apache.lucene.analysis.snowball.SnowballAnalyzer as its analyzer. I like it for the stemming abilities. How can I get support for this (or another analyzer) in ElasticSearch? Thanks.

I am currently on ElasticSearch 0.7.1.

http://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/

On Tue, May 25, 2010 at 10:24 PM, John Chang jchangkihtest2@gmail.comwrote:

My lucene app (which I am converting to Elasticsearch) uses
org.apache.lucene.analysis.snowball.SnowballAnalyzer as its analyzer. I
like it for the stemming abilities. How can I get support for this (or
another analyzer) in Elasticsearch? Thanks.

I am currently on Elasticsearch 0.7.1.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p842952.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I looked in the org.elasticsearch.index.analysis directory and could not find any analyzer privider for the SnowballAnalyzer. So, using the BrazilianAnalyzerProvider as an example, I tried to create my own, but I keep getting this
error upon startup:
"1) Could not find a suitable constructor in com.kiha.server.services.index.KihaSnowballAnalyzer. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private."

However, my class does have one constructor with an @Inject annotation. This is what I did:

I tried adding this to my elasticsearch.json:
index:
analysis:
analyzer:
standard:
type: com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider

I put com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider into a jar in the ./lib directory. The class has this constructor:

@Inject public MyCompanySnowballAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
    super(index, indexSettings, name);
    analyzer = new SnowballAnalyzer(Version.LUCENE_CURRENT, "English");
}

The class extends AbstractAnalyzerProvider:
public class MyCompanySnowballAnalyzerProvider extends AbstractAnalyzerProvider

Typo: in the above message, I meant to replace "kiha" with "mycompany" throughout, but missed some places. So "Could not find a suitable constructor in com.kiha.server.services.index.KihaSnowballAnalyzer." should be read as "Could not find a suitable constructor in com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider." in order to correspond to the rest of my post. Thanks.

This should work, strange. You get an exception for
com.kiha.server.services.index.KihaSnowballAnalyzer,
while I would expect you would get the exception for
com.kiha.server.services.index.KihaSnowballAnalyzerProvider
(note the Provider at the end). Which one are you getting again?

On Thu, May 27, 2010 at 12:23 AM, John Chang jchangkihtest2@gmail.comwrote:

Typo: in the above message, I meant to replace "kiha" with "mycompany"
throughout, but missed some places. So "Could not find a suitable
constructor in com.kiha.server.services.index.KihaSnowballAnalyzer." should
be read as "Could not find a suitable constructor in
com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider." in
order to correspond to the rest of my post. Thanks.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p846196.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I am getting an exception from com.kiha.server.services.index.KihaSnowballAnalyzer. Although perhaps oddly named, this class IS an extension of AbstractAnalyzerProvider. It is declared as:

public class KihaSnowballAnalyzer extends AbstractAnalyzerProvider

I do not have a class named KihaSnowballAnalyzerProvider.

Can you write a quick test case and I will check why this happens?

On Wed, Jun 2, 2010 at 1:29 AM, John Chang jchangkihtest2@gmail.com wrote:

I am getting an exception from
com.kiha.server.services.index.KihaSnowballAnalyzer. Although perhaps
oddly
named, this class IS an extension of AbstractAnalyzerProvider. It is
declared as:

public class KihaSnowballAnalyzer extends
AbstractAnalyzerProvider

I do not have a class named KihaSnowballAnalyzerProvider.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p863060.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I can't write a test case in terms of a JUnit to repro the issue, as I can't get the server to start up without error. The test is simply to start up the server and look for errors. If you wanted to repro it, all I can think of is to try to run my class and config. To repro:

  1. Use 0.7.1 (no particular reason I'm not on latest; just that it changes often, I can try with the latest if you advise)

  2. Add this to your elasticsearch.yml:
    index:
    analysis:
    analyzer:
    standard:
    type: com.kiha.server.services.index.KihaSnowballAnalyzer

  3. Compile this class and put it in your classpath. Note that it is modeled on the BrazilianAnalyzer I found. Note also that to get it to compile, you need lucene-snowball-3.0.0-sources.jar.

Thanks for your time.

package com.kiha.server.services.index;

import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterators;
import com.google.inject.Inject;
import com.google.inject.assistedinject.Assisted;
//import org.apache.lucene.analysis.br.BrazilianAnalyzer;
import org.apache.lucene.analysis.snowball.SnowballAnalyzer;
import org.apache.lucene.util.Version;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.analysis.AbstractAnalyzerProvider;
import org.elasticsearch.index.settings.IndexSettings;
import org.elasticsearch.util.settings.Settings;

import java.util.Set;

public class KihaSnowballAnalyzer extends AbstractAnalyzerProvider {

private final Set<?> stopWords;

private final Set<?> stemExclusion;

private final SnowballAnalyzer analyzer;

@Inject public KihaSnowballAnalyzer(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
    super(index, indexSettings, name);
    String[] stopWords = settings.getAsArray("stopwords");
    if (stopWords.length > 0) {
        this.stopWords = ImmutableSet.copyOf(Iterators.forArray(stopWords));
    } else {
        this.stopWords = ImmutableSet.copyOf(Iterators.forArray());
        //this.stopWords = BrazilianAnalyzer.getDefaultStopSet();
    }

    String[] stemExclusion = settings.getAsArray("stem_exclusion");
    if (stemExclusion.length > 0) {
        this.stemExclusion = ImmutableSet.copyOf(Iterators.forArray(stemExclusion));
    } else {
        this.stemExclusion = ImmutableSet.of();
    }
    analyzer = new SnowballAnalyzer(Version.LUCENE_CURRENT, "English");
}

@Override public SnowballAnalyzer get() {
    return this.analyzer;
}

}