Hi Joaquin,
If I remember correctly Lucene includes the Snowball family of
stemmers in it's contrib package. PorterStemmer is included in ES but
it's English only. I'll try to draft an exmaple for you, I would
suggest that you look at how PorterStemmer is included in ES and
create similar classes to use Lucene's Snowball, take a look at the
class:
- PorterStemTokenFilterFactory: it's the factory responsible of
creating and configuring the stemmer
Create a similar factory, eg: SnowballTokenFilterFactory
package org.elasticsearch.index.analysis;
imports....
public class SnowballTokenFilterFactory extends
AbstractTokenFilterFactory {
private String language;
@Inject public SnowballTokenFilterFactory(Index index,
@IndexSettings Settings indexSettings, @Assisted String name,
@Assisted Settings settings) {
super(index, indexSettings, name);
this.language = settings.get("language");
}
@Override public TokenStream create(TokenStream tokenStream) {
return new SnowballFilter(tokenStream, language);
}
}
I haven't tried it, but it should work pretty much like that, with the
correct imports. It's important (or at least it was when I looked into
it sometime ago) to use the package "org.elasticsearch.*" for these
factories. Then the variable "language" should appear in
elasticsearch.yml, like:
index:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: whitespace
filter: [lowercase, asciifolding, snowball]
filter:
snowball:
type:
org.elasticsearch.index.analysis.SnowballTokenFilterFactory
language: Spanish
You need to include your custom classes, in this case
SnowballTokenFilterFactory, in a jar in the lib directory of ES.
Please give it a try and let me know if you have some problems.
Regards,
Sebastian.
On Dec 21, 11:03 am, Joaquin Cuenca Abela joaq...@cuencaabela.com
wrote:
Hi,
I want to index some spanish text, and it seems that I need to use
some external stemmer to do this, as ES/Lucene doesn't include by
default any stemmer for spanish.
Is there any docs on how to use an external stemmer (for instance the
Snowball ones) with Elasticsearch?
Thanks!
--
Joaquin Cuenca Abela