ElasticSearch with stemming/snwoball


(Torsten) #1

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with word stemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-finding-my-term.
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true' -d '
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

Any hints? Thanks in advance!


(Shay Banon) #2

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?
On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with word stemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-finding-my-term.
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true' -d '
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(Torsten) #3

Sure: https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with word stemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d '
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(Shay Banon) #4

Heya,

You don't specify the analyzer setting in the mapping, and the index_analyzer and search_analyzer do not control the default analyzers to be used in mappings case when none is specified. For that, you can specify the "default" name (for both index and search), or "default_index" and "default_search".

Also, make sure, using the analyze API, that you are getting what you are after, since combining ngram with stemming might make little sense (especially with the default ngram settings).

You can, if oyu want, have a multi_field mapping, and have one mapping indexed using just ngram, and one with just stemming.

-shay.banon
On Wednesday, March 2, 2011 at 9:56 PM, Torsten wrote:

Sure: https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with word stemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d '
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(Torsten) #5

Now it works, thanks a lot for your help!

On Mar 3, 12:01 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

You don't specify the analyzer setting in the mapping, and the index_analyzer and search_analyzer do not control the default analyzers to be used in mappings case when none is specified. For that, you can specify the "default" name (for both index and search), or "default_index" and "default_search".

Also, make sure, using the analyze API, that you are getting what you are after, since combining ngram with stemming might make little sense (especially with the default ngram settings).

You can, if oyu want, have a multi_field mapping, and have one mapping indexed using just ngram, and one with just stemming.

-shay.banon

On Wednesday, March 2, 2011 at 9:56 PM, Torsten wrote:

Sure:https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with word stemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d'
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(Rich Kroll) #6

Torsten,
Can you gist your config so that others stumbling on this thread can
see what the proper mapping looks like? This seems to be a recurring
issue for people (myself included), so it would be great to have an
example of how to properly configure this.

Thanks!

Rich

On Mar 2, 7:49 pm, Torsten admiralc...@gmail.com wrote:

Now it works, thanks a lot for your help!

On Mar 3, 12:01 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

You don't specify the analyzer setting in the mapping, and the index_analyzer and search_analyzer do not control the default analyzers to be used in mappings case when none is specified. For that, you can specify the "default" name (for both index and search), or "default_index" and "default_search".

Also, make sure, using the analyze API, that you are getting what you are after, since combining ngram withstemmingmight make little sense (especially with the default ngram settings).

You can, if oyu want, have a multi_field mapping, and have one mapping indexed using just ngram, and one with juststemming.

-shay.banon

On Wednesday, March 2, 2011 at 9:56 PM, Torsten wrote:

Sure:https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with wordstemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d'
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(stelios) #7

that would be great :slight_smile:


(Torsten) #8

Hi Rick,

of course, I'm glad if I can help. :slight_smile:

Disclaimer: I am new to ES/Lucene and have no experience with it.
Therefore it is quite possible that some settings are wrong or don't
make sense.

On Mar 3, 6:04 pm, Rich Kroll kroll.r...@gmail.com wrote:

Torsten,
Can you gist your config so that others stumbling on this thread can
see what the proper mapping looks like? This seems to be a recurring
issue for people (myself included), so it would be great to have an
example of how to properly configure this.

Thanks!

Rich

On Mar 2, 7:49 pm, Torsten admiralc...@gmail.com wrote:

Now it works, thanks a lot for your help!

On Mar 3, 12:01 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

You don't specify the analyzer setting in the mapping, and the index_analyzer and search_analyzer do not control the default analyzers to be used in mappings case when none is specified. For that, you can specify the "default" name (for both index and search), or "default_index" and "default_search".

Also, make sure, using the analyze API, that you are getting what you are after, since combining ngram withstemmingmight make little sense (especially with the default ngram settings).

You can, if oyu want, have a multi_field mapping, and have one mapping indexed using just ngram, and one with juststemming.

-shay.banon

On Wednesday, March 2, 2011 at 9:56 PM, Torsten wrote:

Sure:https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with wordstemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d'
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(Torsten) #9

Hi Rich,

of course, I'm glad if I can help. :slight_smile:

Disclaimer: I am new to ES/Lucene and have no experience with it.
Therefore it is quite possible that some settings are wrong or don't
make sense.

On Mar 3, 6:04 pm, Rich Kroll kroll.r...@gmail.com wrote:

Torsten,
Can you gist your config so that others stumbling on this thread can
see what the proper mapping looks like? This seems to be a recurring
issue for people (myself included), so it would be great to have an
example of how to properly configure this.

Thanks!

Rich

On Mar 2, 7:49 pm, Torsten admiralc...@gmail.com wrote:

Now it works, thanks a lot for your help!

On Mar 3, 12:01 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

You don't specify the analyzer setting in the mapping, and the index_analyzer and search_analyzer do not control the default analyzers to be used in mappings case when none is specified. For that, you can specify the "default" name (for both index and search), or "default_index" and "default_search".

Also, make sure, using the analyze API, that you are getting what you are after, since combining ngram withstemmingmight make little sense (especially with the default ngram settings).

You can, if oyu want, have a multi_field mapping, and have one mapping indexed using just ngram, and one with juststemming.

-shay.banon

On Wednesday, March 2, 2011 at 9:56 PM, Torsten wrote:

Sure:https://gist.github.com/851600

On Mar 2, 1:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Cam you gist a curl recreation with the curl that created the index with the relevant mapping and analyzer config?

On Tuesday, March 1, 2011 at 9:21 PM, Torsten wrote:

Hello,

I am using ElasticSearch (0.15.0) as a search engine with full-text
search and word highlighting. It works but I'm not able to configure
an analyser with wordstemming.
I know that my problem is very similar to
http://stackoverflow.com/questions/4981001/why-elasticsearch-is-not-f....
I tried the suggested solution but
it still doesn't work.

My cluster metadata looks like this:
metadata: {
templates: { }
indices: {
test_index: {
state: open
settings: {
index.analysis.filter.snowball.language: English
index.analysis.analyzer.search_analyzer.filter.0:
lowercase
index.analysis.analyzer.search_analyzer.filter.1:
snowball
index.analysis.analyzer.index_analyzer.filter.0:
lowercase
index.analysis.filter.snowball.type: snowball
index.analysis.analyzer.search_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.tokenizer:
nGram
index.analysis.analyzer.index_analyzer.filter.1:
snowball
index.number_of_shards: 5
index.number_of_replicas: 1
}
mappings: {
test_item: {
properties: {
title: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
description: {
store: yes
analyzer: standard
term_vector: with_positions_offsets
type: string
}
link: {
type: string
}
}
}
}
aliases: [ ]
}
}
}

And my query:
curl -XGET 'http://localhost:9200/_search?pretty=true'-d'
{
"highlight": {
"tags_schema": "styled",
"fields": {
"title": {
"number_of_fragments": 0
},
"description": {
"number_of_fragments": 0
}
}
},
"query": {
"query_string": {
"fields" : ["title", "description"],
"query" : "apple" /* finds entries with "apple", a search
with "apples" finds nothing */
}
}
}'

https://gist.github.com/849684

Any hints? Thanks in advance!


(system) #10