Stemming

Hello

I'm trying to use Swedish stemming in elasticsearch and I keep getting
problem with it. I could need some advice about how to deal with this stuff.

The main problem is that the stemmers stems some words in a weird way which
makes my hits either go through the roof or not match at all.
At first I used the "Swedish" snowball filter but it seemed to stem a bit
"too hard".

Example:
Original => Stemmed
"ledare" => "led"
"ledig" => "led"
"ledighet" => "led"
"ledighetens" => "led"

None of these words should really be stemmed to "led". So, I changed to the
"light_swedish" filter instead. It seems to be a bit more conservative with
its stemming which I like:

Original => Stemmed
"ledare" => "led"
"ledig" => "ledig"
"ledighet" => "ledig"
"ledighetens" => "ledig"

But, with other words it does a not so good job:

Original => Stemmed
"förmån" => "förma"
"förmåner" => "förmån"
"förmånen" => "förmån"
...

All of these should be stemmed to "förmån".

I get that the stemmers are not perfect, but how can I make the best of it?
How do you handle things like this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/960d182d-f212-4a94-9b7c-bac36644829d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Maybe this would help you:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#swedish-analyzer
in the example they use only "swedish"

Am Dienstag, 11. November 2014 14:55:47 UTC+1 schrieb Linus Pettersson:

Hello

I'm trying to use Swedish stemming in elasticsearch and I keep getting
problem with it. I could need some advice about how to deal with this stuff.

The main problem is that the stemmers stems some words in a weird way
which makes my hits either go through the roof or not match at all.
At first I used the "Swedish" snowball filter but it seemed to stem a bit
"too hard".

Example:
Original => Stemmed
"ledare" => "led"
"ledig" => "led"
"ledighet" => "led"
"ledighetens" => "led"

None of these words should really be stemmed to "led". So, I changed to
the "light_swedish" filter instead. It seems to be a bit more conservative
with its stemming which I like:

Original => Stemmed
"ledare" => "led"
"ledig" => "ledig"
"ledighet" => "ledig"
"ledighetens" => "ledig"

But, with other words it does a not so good job:

Original => Stemmed
"förmån" => "förma"
"förmåner" => "förmån"
"förmånen" => "förmån"
...

All of these should be stemmed to "förmån".

I get that the stemmers are not perfect, but how can I make the best of it?
How do you handle things like this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc28b176-938a-4e76-95f6-c0c0e9ba32a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.