English stemming


(Chris Berkhout) #1

I'm trying to get basic English stemming to work.

Words like 'and' get filtered out, but I can't stem plurals or conjugations.

Here's a recreation of an unexpected result:

I've tried several alternative configs from the docs. Any ideas?

Cheers,
Chris


(Clinton Gormley) #2

Hi Chris

Here's a recreation of an unexpected result:
https://gist.github.com/5021c5c7ef129be73346

Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.

Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'

Then you can test it with the analyze API:

curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'

[Sun Jul 3 14:27:44 2011] Response:

{

"tokens" : [

{

"end_offset" : 7,

"position" : 2,

"start_offset" : 4,

"type" : "",

"token" : "fox"

},

{

"end_offset" : 13,

"position" : 3,

"start_offset" : 8,

"type" : "",

"token" : "jump"

}

]

}

clint


(Chris Berkhout) #3

Thanks!

Yeah, that's a bit tricky. I hadn't really expected an error message, but
when I saw the settings in there (via elasticsearch-head > info > metadata >
settings), I'd assumed that they were correctly recognised.

Anyway, this does work, and for now I don't need anything more fancy.

Cheers,
Chris

On Sun, Jul 3, 2011 at 8:28 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Hi Chris

Here's a recreation of an unexpected result:
https://gist.github.com/5021c5c7ef129be73346

Unfortunately, index settings will accept any values, and store them.
It doesn't throw an error if it doesn't recognise the settings.

Try this to set the default analyzer for your index to 'english' (which
is a lighter version of the snowball stemmer):

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "english"
}
}
}
}
}
'

Then you can test it with the analyze API:

curl -XGET '
http://127.0.0.1:9200/test/_analyze?pretty=1&text=the+fox+jumps'

[Sun Jul 3 14:27:44 2011] Response:

{

"tokens" : [

{

"end_offset" : 7,

"position" : 2,

"start_offset" : 4,

"type" : "",

"token" : "fox"

},

{

"end_offset" : 13,

"position" : 3,

"start_offset" : 8,

"type" : "",

"token" : "jump"

}

]

}

clint


(system) #4