The Porter Stemming Filter


(Jude) #1

Hi,

I'm having trouble getting the porterStem filter working - using this
filter I'd expect to find a document titled "testing stemming rules"
when searching for "stem" or "rule", but that doesn't seem to be
happening...

A transcript illustrating the problem follows, I've tried this using
both 0.6.0 and 0.7.0 built from today's source.

[jude@s2 elasticsearch]$ curl -XPUT localhost:9200/example.com -
d'index :

analysis :
    analyzer :
        stemming :
            type : custom
            tokenizer : standard
            filter : [standard, lowercase, stop, porterStem]

'
{"ok":true,"acknowledged":true}

[jude@s2 elasticsearch]$ curl -XPUT localhost:9200/example.com/doc/
_mapping -d'{"properties": { "title" : { "analyzer" : "stemming",
"type" : "string" }}}'
{"ok":true,"acknowledged":true}

[jude@s2 elasticsearch]$ curl -XPUT localhost:9200/example.com/doc/
test -d'{"title": "testing stemming rules"}'
{"ok":true,"_index":"example.com","_type":"doc","_id":"test"}

[jude@s2 elasticsearch]$ curl -XGET 127.0.0.1:9200/example.com/_search
-d'{"query":{"query_string":{"query":"stem"}}}'
{"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":
0,"hits":[]}}

[jude@s2 elasticsearch]$ curl -XGET 127.0.0.1:9200/example.com/_search
-d'{"query":{"query_string":{"query":"rule"}}}'
{"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":
0,"hits":[]}}

A status query shows that my custom analyzer has been registered, and
I've set the title property analyzer to "stemming", so I can't see why
it's not being used:

[www@S2 elasticsearch]$ curl -XGET 127.0.0.1:9200/getrevising.co.uk/
_status?pretty=true
{
"ok" : true,
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
},
"indices" : {
"getrevising.co.uk" : {
"aliases" : [ ],
"settings" : {
"index.analysis.analyzer.stemming.type" : "custom",
"index.analysis.analyzer.stemming.tokenizer" : "standard",
"index.analysis.analyzer.stemming.filter.1" : "lowercase",
"index.analysis.analyzer.stemming.filter.0" : "standard",
"index.analysis.analyzer.stemming.filter.3" : "porterStem",
"index.analysis.analyzer.stemming.filter.2" : "stop",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
[...snip...]

I think I must be missing something obvious, could anyone point me in
the right direction please?

Cheers,
Jude

--


(Jude) #2

Hi guys,

Sorry to repost but I still have no success with the porterStem
filter, does anyone else use it?

I can't see what I'm doing wrong here, ES seems perfect except for
this one thing. I could instead stem my text before indexing it, but
I'd rather let ES do it itself if it can.

Transcript follows, have tested this against 0.6.0 and a build freshly
pulled from github. I'm happy to create a new issue on github if this
is a bug.

Thanks,
Jude

[jude@s2]$ curl -XPUT localhost:9200/example.com -d'index :

analysis :
    analyzer :
        stemming :
            type : custom
            tokenizer : standard
            filter : [standard, lowercase, porterStem, stop]

'
{"ok":true,"acknowledged":true}

[jude@s2]$ curl -XPUT localhost:9200/example.com/doc/_mapping -
d'{"properties": { "title" : { "index_analyzer" : "stemming",
"search_analyzer" : "stemming", "type" : "string" }}}'
{"ok":true,"acknowledged":true}

[jude@s2]$ curl -XPUT localhost:9200/example.com/doc/test -d'{"title":
"testing stemming rules"}'
{"ok":true,"_index":"example.com","_type":"doc","_id":"test"}

[jude@s2]$ curl -XGET localhost:9200/example.com/_search -d'{"query":
{"query_string":{"query":"stemming"}}}'
{"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":
1,"hits":[{"_index":"example.com","_type":"doc","_id":"test3",
"_source" : {"title": "testing stemming rules"}}]}}

[jude@s2]$ curl -XGET localhost:9200/example.com/_search -d'{"query":
{"query_string":{"query":"stem"}}}'
{"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":
0,"hits":[]}}

[jude@s2]$ curl -XGET localhost:9200/example.com/_status?pretty=true
{
"ok" : true,
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
},
"indices" : {
"example.com" : {
"aliases" : [ ],
"settings" : {
"index.analysis.analyzer.stemming.type" : "custom",
"index.analysis.analyzer.stemming.tokenizer" : "standard",
"index.analysis.analyzer.stemming.filter.1" : "lowercase",
"index.analysis.analyzer.stemming.filter.0" : "standard",
"index.analysis.analyzer.stemming.filter.3" : "stop",
"index.analysis.analyzer.stemming.filter.2" : "porterStem",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
...


(hovo110) #3

I have exactly same problem.

Also there is no any sample in elasticsearch.com how to setup the stemming.


(system) #4