Unable to configure porter stemm analyser


(WeeJames) #1

I'm looking at using Elasticsearch to provide the search functions of
our site.

I've been experimenting with it but am unable to enable the porterstem
analyser (so that a search for fight matches fights and fighting).

Here's a run down of my input.

curl -XPUT localhost:9200/local/ -d'
index :
    analysis :
        analyzer :
            stemming :
                type : custom
                tokenizer : standard
                filter : [standard, lowercase, stop, porterStem]
'

curl -XPUT localhost:9200/local/_mapping -d'{"properties":

{ "title" : { "analyzer" : "stemming", "type" : "string" }}}'

curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for

your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting
for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad
fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno
fights Tyson tomorrow"}'

However running a search for 'fight' only matches the first entry -
the one that contains the exact term.

curl -XGET localhost:9200/local/_search?q=fight

The correct settings appear to have been set up but doesn't seem to
work.

  "indices" : {
    "local" : {
      "aliases" : [ ],
      "settings" : {
        "index.analysis.analyzer.stemming.type" : "custom",
        "index.analysis.analyzer.stemming.tokenizer" : "standard",
        "index.analysis.analyzer.stemming.filter.1" : "lowercase",
        "index.analysis.analyzer.stemming.filter.0" : "standard",
        "index.analysis.analyzer.stemming.filter.3" :

"porterStem",
"index.analysis.analyzer.stemming.filter.2" : "stop",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},

Anyone got this functionality up and running and able to point me in
the right direction?


(Shay Banon) #2

Here is a gist with a sample of how ti works: https://gist.github.com/879883.

Few notes on what you do:

  1. The put mapping should be on index and type, PUT localhost:9200/local/article/_mapping.
  2. The body of put mapping should include the type as top level.
  3. By default, when searching without specifying on a specific field, the _all field is searched, and thats not stemmed based on the mappings, only title is stemmed.

-shay.banon
On Monday, March 21, 2011 at 6:27 PM, WeeJames wrote:

I'm looking at using Elasticsearch to provide the search functions of
our site.

I've been experimenting with it but am unable to enable the porterstem
analyser (so that a search for fight matches fights and fighting).

Here's a run down of my input.

curl -XPUT localhost:9200/local/ -d'
index :
analysis :
analyzer :
stemming :
type : custom
tokenizer : standard
filter : [standard, lowercase, stop, porterStem]
'

curl -XPUT localhost:9200/local/_mapping -d'{"properties":
{ "title" : { "analyzer" : "stemming", "type" : "string" }}}'

curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for
your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting
for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad
fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno
fights Tyson tomorrow"}'

However running a search for 'fight' only matches the first entry -
the one that contains the exact term.

curl -XGET localhost:9200/local/_search?q=fight

The correct settings appear to have been set up but doesn't seem to
work.

"indices" : {
"local" : {
"aliases" : [ ],
"settings" : {
"index.analysis.analyzer.stemming.type" : "custom",
"index.analysis.analyzer.stemming.tokenizer" : "standard",
"index.analysis.analyzer.stemming.filter.1" : "lowercase",
"index.analysis.analyzer.stemming.filter.0" : "standard",
"index.analysis.analyzer.stemming.filter.3" :
"porterStem",
"index.analysis.analyzer.stemming.filter.2" : "stop",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},

Anyone got this functionality up and running and able to point me in
the right direction?


(Torsten) #3

Hi,

I had a similar problem with the Snowball analyzer. With Shay's help I
was able to get it running. You can find the example here:

http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/b94f58fef95db90e/

On Mar 21, 5:27 pm, WeeJames weeja...@gmail.com wrote:

I'm looking at using Elasticsearch to provide the search functions of
our site.

I've been experimenting with it but am unable to enable the porterstem
analyser (so that a search for fight matches fights and fighting).

Here's a run down of my input.

curl -XPUT localhost:9200/local/ -d'
index :
    analysis :
        analyzer :
            stemming :
                type : custom
                tokenizer : standard
                filter : [standard, lowercase, stop, porterStem]
'

curl -XPUT localhost:9200/local/_mapping -d'{"properties":

{ "title" : { "analyzer" : "stemming", "type" : "string" }}}'

curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for

your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting
for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad
fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno
fights Tyson tomorrow"}'

However running a search for 'fight' only matches the first entry -
the one that contains the exact term.

curl -XGET localhost:9200/local/_search?q=fight

The correct settings appear to have been set up but doesn't seem to
work.

  "indices" : {
    "local" : {
      "aliases" : [ ],
      "settings" : {
        "index.analysis.analyzer.stemming.type" : "custom",
        "index.analysis.analyzer.stemming.tokenizer" : "standard",
        "index.analysis.analyzer.stemming.filter.1" : "lowercase",
        "index.analysis.analyzer.stemming.filter.0" : "standard",
        "index.analysis.analyzer.stemming.filter.3" :

"porterStem",
"index.analysis.analyzer.stemming.filter.2" : "stop",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},

Anyone got this functionality up and running and able to point me in
the right direction?


(WeeJames) #4

Thanks. I've got it up and running now thanks to the gist. Is there
anyway to apply the index level mapping through a config file (json or
yaml)?

On Mar 21, 7:22 pm, Torsten admiralc...@gmail.com wrote:

Hi,

I had a similar problem with the Snowball analyzer. With Shay's help I
was able to get it running. You can find the example here:

http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...https://gist.github.com/853994

On Mar 21, 5:27 pm, WeeJames weeja...@gmail.com wrote:

I'm looking at using Elasticsearch to provide the search functions of
our site.

I've been experimenting with it but am unable to enable the porterstem
analyser (so that a search for fight matches fights and fighting).

Here's a run down of my input.

curl -XPUT localhost:9200/local/ -d'
index :
    analysis :
        analyzer :
            stemming :
                type : custom
                tokenizer : standard
                filter : [standard, lowercase, stop, porterStem]
'
curl -XPUT localhost:9200/local/_mapping -d'{"properties":

{ "title" : { "analyzer" : "stemming", "type" : "string" }}}'

curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for

your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting
for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad
fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno
fights Tyson tomorrow"}'

However running a search for 'fight' only matches the first entry -
the one that contains the exact term.

curl -XGET localhost:9200/local/_search?q=fight

The correct settings appear to have been set up but doesn't seem to
work.

  "indices" : {
    "local" : {
      "aliases" : [ ],
      "settings" : {
        "index.analysis.analyzer.stemming.type" : "custom",
        "index.analysis.analyzer.stemming.tokenizer" : "standard",
        "index.analysis.analyzer.stemming.filter.1" : "lowercase",
        "index.analysis.analyzer.stemming.filter.0" : "standard",
        "index.analysis.analyzer.stemming.filter.3" :

"porterStem",
"index.analysis.analyzer.stemming.filter.2" : "stop",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},

Anyone got this functionality up and running and able to point me in
the right direction?


(system) #5