English stemming

I'm trying to get basic English stemming to work.

Words like 'and' get filtered out, but I can't stem plurals or conjugations.

Here's a recreation of an unexpected result:

I've tried several alternative configs from the docs. Any ideas?

Cheers,
Chris

Hi Chris

Here's a recreation of an unexpected result:
stemming problem with snowball · GitHub

Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.

Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'

Then you can test it with the analyze API:

curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'

[Sun Jul 3 14:27:44 2011] Response:

{

"tokens" : [

{

"end_offset" : 7,

"position" : 2,

"start_offset" : 4,

"type" : "",

"token" : "fox"

},

{

"end_offset" : 13,

"position" : 3,

"start_offset" : 8,

"type" : "",

"token" : "jump"

}

]

}

clint

Thanks!

Yeah, that's a bit tricky. I hadn't really expected an error message, but
when I saw the settings in there (via elasticsearch-head > info > metadata >
settings), I'd assumed that they were correctly recognised.

Anyway, this does work, and for now I don't need anything more fancy.

Cheers,
Chris

On Sun, Jul 3, 2011 at 8:28 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Hi Chris

Here's a recreation of an unexpected result:
stemming problem with snowball · GitHub

Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.

Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'

Then you can test it with the analyze API:

curl -XGET '
http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'

[Sun Jul 3 14:27:44 2011] Response:

{

"tokens" : [

{

"end_offset" : 7,

"position" : 2,

"start_offset" : 4,

"type" : "",

"token" : "fox"

},

{

"end_offset" : 13,

"position" : 3,

"start_offset" : 8,

"type" : "",

"token" : "jump"

}

]

}

clint