I'm trying to get basic English stemming to work.
Words like 'and' get filtered out, but I can't stem plurals or conjugations.
Here's a recreation of an unexpected result:
I've tried several alternative configs from the docs. Any ideas?
Cheers,
Chris
Hi Chris
Here's a recreation of an unexpected result:
stemming problem with snowball · GitHub
Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.
Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'
Then you can test it with the analyze API:
curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'
[Sun Jul 3 14:27:44 2011] Response:
{
"tokens" : [
{
"end_offset" : 7,
"position" : 2,
"start_offset" : 4,
"type" : "",
"token" : "fox"
},
{
"end_offset" : 13,
"position" : 3,
"start_offset" : 8,
"type" : "",
"token" : "jump"
}
]
}
clint
Thanks!
Yeah, that's a bit tricky. I hadn't really expected an error message, but
when I saw the settings in there (via elasticsearch-head > info > metadata >
settings), I'd assumed that they were correctly recognised.
Anyway, this does work, and for now I don't need anything more fancy.
Cheers,
Chris
On Sun, Jul 3, 2011 at 8:28 PM, Clinton Gormley clinton@iannounce.co.ukwrote:
Hi Chris
Here's a recreation of an unexpected result:
stemming problem with snowball · GitHub
Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.
Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'
Then you can test it with the analyze API:
curl -XGET '
http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'
[Sun Jul 3 14:27:44 2011] Response:
{
"tokens" : [
{
"end_offset" : 7,
"position" : 2,
"start_offset" : 4,
"type" : "",
"token" : "fox"
},
{
"end_offset" : 13,
"position" : 3,
"start_offset" : 8,
"type" : "",
"token" : "jump"
}
]
}
clint