Can't get snowball analyzer to work via Java


(Jason-5) #1

Hey folks,

This is either a misconfiguration by me, or a misunderstanding (or both)
but I'm struggling to get the snowball analyzer to work.

When I create my index(es) I am specifying the following settings:

ImmutableSettings.settingsBuilder().loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("custom")
.field("tokenizer", "standard")
.field("filter", new String[]{"standard", "lowercase",
"snowball"})
.endObject()
.endObject()
.startObject("filter")
.startObject("snowball")
.field("type", "snowball")
.field("language", "English")
.endObject()
.endObject()
.endObject()
.endObject().string());

Then when I perform a search I am specifying the analyzer:

QueryBuilder query = queryString(queryString).analyzer("custom");

But it's not working (meaning, a search for the term "art" does not match
documents with the word "arts"). I have NOT yet manually added the
analyzer to the field definition in the mapping because I don't want to
define a specific language for the field (I don't know ahead of time what
language the content will be in.

I'm wondering if this is just the wrong approach. Do I HAVE to nominate a
specific analyser for a single field?, and if so how would one go about
supporting multiple languages? (multiple indexes I guess?)

What I'm really looking for is a full working example of a "sensible"
configuration for ElasticSearch which will give me all the basic free text
search features, like stemming. Although the "a la carte" approach is
great, it would be nice if the default implementation served the most
predominant use case(s).

Thanks!

--


(David Pilato) #2

I think that you have indexed arts with a default analyzer. So, when you search for art, you can't find it.
Specifying analyzer at search time means that your searched string is analyzed before being compared with the index. So Art is analyzed to art.

To make it work, you should apply your analyzer at index time. So, you need to define it on your field.

If you have multiple analyzers to apply to a field, I recommand to use the cool multifield feature.
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 août 2012 à 02:30, Jason jason.polites@gmail.com a écrit :

Hey folks,

This is either a misconfiguration by me, or a misunderstanding (or both) but I'm struggling to get the snowball analyzer to work.

When I create my index(es) I am specifying the following settings:

ImmutableSettings.settingsBuilder().loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("custom")
.field("tokenizer", "standard")
.field("filter", new String[]{"standard", "lowercase", "snowball"})
.endObject()
.endObject()
.startObject("filter")
.startObject("snowball")
.field("type", "snowball")
.field("language", "English")
.endObject()
.endObject()
.endObject()
.endObject().string());

Then when I perform a search I am specifying the analyzer:

QueryBuilder query = queryString(queryString).analyzer("custom");

But it's not working (meaning, a search for the term "art" does not match documents with the word "arts"). I have NOT yet manually added the analyzer to the field definition in the mapping because I don't want to define a specific language for the field (I don't know ahead of time what language the content will be in.

I'm wondering if this is just the wrong approach. Do I HAVE to nominate a specific analyser for a single field?, and if so how would one go about supporting multiple languages? (multiple indexes I guess?)

What I'm really looking for is a full working example of a "sensible" configuration for ElasticSearch which will give me all the basic free text search features, like stemming. Although the "a la carte" approach is great, it would be nice if the default implementation served the most predominant use case(s).

Thanks!

--

--


(Jason-5) #3

Hi David,

Sorry for the late reply.. I was out of town... just wanted to say thanks!

  • Jason.

On Monday, August 20, 2012 6:51:16 PM UTC-7, David Pilato wrote:

I think that you have indexed arts with a default analyzer. So, when you
search for art, you can't find it.
Specifying analyzer at search time means that your searched string is
analyzed before being compared with the index. So Art is analyzed to art.

To make it work, you should apply your analyzer at index time. So, you
need to define it on your field.

If you have multiple analyzers to apply to a field, I recommand to use the
cool multifield feature.
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 août 2012 à 02:30, Jason <jason....@gmail.com <javascript:>> a
écrit :

Hey folks,

This is either a misconfiguration by me, or a misunderstanding (or both)
but I'm struggling to get the snowball analyzer to work.

When I create my index(es) I am specifying the following settings:

ImmutableSettings.settingsBuilder().loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("custom")
.field("tokenizer", "standard")
.field("filter", new String[]{"standard", "lowercase",
"snowball"})
.endObject()
.endObject()
.startObject("filter")
.startObject("snowball")
.field("type", "snowball")
.field("language", "English")
.endObject()
.endObject()
.endObject()
.endObject().string());

Then when I perform a search I am specifying the analyzer:

QueryBuilder query = queryString(queryString).analyzer("custom");

But it's not working (meaning, a search for the term "art" does not match
documents with the word "arts"). I have NOT yet manually added the
analyzer to the field definition in the mapping because I don't want to
define a specific language for the field (I don't know ahead of time what
language the content will be in.

I'm wondering if this is just the wrong approach. Do I HAVE to nominate a
specific analyser for a single field?, and if so how would one go about
supporting multiple languages? (multiple indexes I guess?)

What I'm really looking for is a full working example of a "sensible"
configuration for ElasticSearch which will give me all the basic free text
search features, like stemming. Although the "a la carte" approach is
great, it would be nice if the default implementation served the most
predominant use case(s).

Thanks!

--

--


(system) #4