Search analyzer not being applied


(Mattias Nordberg) #1

Hi,

I've got an index which has been configured to use the snowball
analyzer (English) as both index_analyzer and search_analyzer. The
problem is that it doesn't appear to be applied to any search queries,
but works perfectly at indexing time.

My analyzer in elasticsearch.json:
"index":{
"analysis":{
"analyzer":{
"snowball_en":{
"type":"snowball",
"language":"English"
}
}
}
}

Looking at _cluster/state, I've successfully configured a template
that will assign that analyzer to any new index ending in "_en".
"templates" : {
"english_index" : {
"template" : "*_en",
"order" : 0,
"settings" : {
},
"mappings" : {
"webpage" : {
"index_analyzer" : "snowball_en",
"search_analyzer" : "snowball_en"
}
}
}

Again in _cluster/state, the english index:
"indices" : {
"test_en" : {
"state" : "open",
"settings" : {
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
"mappings" : {
"webpage" : {
"_source" : {
"compress" : true
},
"dynamic_templates" : [ {
"everything_else" : {
"mapping" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string",
"match" : "*"
}
} ],
"analyzer" : "snowball_en",
"properties" : {
"id" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"title" : {
"include_in_all" : true,
"type" : "string"
},
"text" : {
"include_in_all" : true,
"type" : "string"
},
"language" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"url" : {
"include_in_all" : true,
"index" : "not_analyzed",
"type" : "string"
},
"fields" : {
"dynamic" : "true",
"properties" : {
"tags" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
}
}
}
}
}
},
"aliases" : [ ]
}
}
}

What I get when I run a test analysis against that index:
curl -XGET 'localhost:9200/test_en/_analyze' -d 'getting started'
{"tokens":[{"token":"getting","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"started","start_offset":
8,"end_offset":15,"type":"","position":2}]}

If I explicitly specify the analyzer:
curl -XGET 'localhost:9200/test_en/_analyze?analyzer=snowball_en' -d
'getting started'
{"tokens":[{"token":"get","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"start","start_offset":
8,"end_offset":15,"type":"","position":2}]}

My understanding was that specifying 'search_analyzer' would cause
elasticsearch to analyze the query string and in this case the two
statements above would return the same result?

Best regards
Mattias


(Shay Banon) #2

You set it on the mapping level for webpage, which has no concept of
index_analyzer or search_analyzer. Do you want to set it as the default
analyzer for the index? If so, in your analysis config, simply configure a
"default" analyzer (name) that has the snowball properties.

On Wed, Aug 31, 2011 at 4:43 PM, Mattias Nordberg <
mattias.nordberg@gmail.com> wrote:

Hi,

I've got an index which has been configured to use the snowball
analyzer (English) as both index_analyzer and search_analyzer. The
problem is that it doesn't appear to be applied to any search queries,
but works perfectly at indexing time.

My analyzer in elasticsearch.json:
"index":{
"analysis":{
"analyzer":{
"snowball_en":{
"type":"snowball",
"language":"English"
}
}
}
}

Looking at _cluster/state, I've successfully configured a template
that will assign that analyzer to any new index ending in "_en".
"templates" : {
"english_index" : {
"template" : "*_en",
"order" : 0,
"settings" : {
},
"mappings" : {
"webpage" : {
"index_analyzer" : "snowball_en",
"search_analyzer" : "snowball_en"
}
}
}

Again in _cluster/state, the english index:
"indices" : {
"test_en" : {
"state" : "open",
"settings" : {
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
"mappings" : {
"webpage" : {
"_source" : {
"compress" : true
},
"dynamic_templates" : [ {
"everything_else" : {
"mapping" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string",
"match" : "*"
}
} ],
"analyzer" : "snowball_en",
"properties" : {
"id" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"title" : {
"include_in_all" : true,
"type" : "string"
},
"text" : {
"include_in_all" : true,
"type" : "string"
},
"language" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"url" : {
"include_in_all" : true,
"index" : "not_analyzed",
"type" : "string"
},
"fields" : {
"dynamic" : "true",
"properties" : {
"tags" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
}
}
}
}
}
},
"aliases" : [ ]
}
}
}

What I get when I run a test analysis against that index:
curl -XGET 'localhost:9200/test_en/_analyze' -d 'getting started'
{"tokens":[{"token":"getting","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"started","start_offset":
8,"end_offset":15,"type":"","position":2}]}

If I explicitly specify the analyzer:
curl -XGET 'localhost:9200/test_en/_analyze?analyzer=snowball_en' -d
'getting started'
{"tokens":[{"token":"get","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"start","start_offset":
8,"end_offset":15,"type":"","position":2}]}

My understanding was that specifying 'search_analyzer' would cause
elasticsearch to analyze the query string and in this case the two
statements above would return the same result?

Best regards
Mattias


(Mattias Nordberg) #3

Well, yeah but I want it to be dependent on the language of the index
which I indicate with the language code suffix on the index name. I've
got two templates, one for english indices and one for swedish
indices, _en should get snowball_en and _sv should get snowball_sv.
Can I set the analyzer in the template? I tried, but as you say it
ended up on the mapping :slight_smile:

Many thanks
Mattias

On Aug 31, 3:07 pm, Shay Banon kim...@gmail.com wrote:

You set it on the mapping level for webpage, which has no concept of
index_analyzer or search_analyzer. Do you want to set it as the default
analyzer for the index? If so, in your analysis config, simply configure a
"default" analyzer (name) that has the snowball properties.

On Wed, Aug 31, 2011 at 4:43 PM, Mattias Nordberg <

mattias.nordb...@gmail.com> wrote:

Hi,

I've got an index which has been configured to use the snowball
analyzer (English) as both index_analyzer and search_analyzer. The
problem is that it doesn't appear to be applied to any search queries,
but works perfectly at indexing time.

My analyzer in elasticsearch.json:
"index":{
"analysis":{
"analyzer":{
"snowball_en":{
"type":"snowball",
"language":"English"
}
}
}
}

Looking at _cluster/state, I've successfully configured a template
that will assign that analyzer to any new index ending in "_en".
"templates" : {
"english_index" : {
"template" : "*_en",
"order" : 0,
"settings" : {
},
"mappings" : {
"webpage" : {
"index_analyzer" : "snowball_en",
"search_analyzer" : "snowball_en"
}
}
}

Again in _cluster/state, the english index:
"indices" : {
"test_en" : {
"state" : "open",
"settings" : {
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
"mappings" : {
"webpage" : {
"_source" : {
"compress" : true
},
"dynamic_templates" : [ {
"everything_else" : {
"mapping" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string",
"match" : "*"
}
} ],
"analyzer" : "snowball_en",
"properties" : {
"id" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"title" : {
"include_in_all" : true,
"type" : "string"
},
"text" : {
"include_in_all" : true,
"type" : "string"
},
"language" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"url" : {
"include_in_all" : true,
"index" : "not_analyzed",
"type" : "string"
},
"fields" : {
"dynamic" : "true",
"properties" : {
"tags" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
}
}
}
}
}
},
"aliases" : [ ]
}
}
}

What I get when I run a test analysis against that index:
curl -XGET 'localhost:9200/test_en/_analyze' -d 'getting started'
{"tokens":[{"token":"getting","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"started","start_offset":
8,"end_offset":15,"type":"","position":2}]}

If I explicitly specify the analyzer:
curl -XGET 'localhost:9200/test_en/_analyze?analyzer=snowball_en' -d
'getting started'
{"tokens":[{"token":"get","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"start","start_offset":
8,"end_offset":15,"type":"","position":2}]}

My understanding was that specifying 'search_analyzer' would cause
elasticsearch to analyze the query string and in this case the two
statements above would return the same result?

Best regards
Mattias


(Mattias Nordberg) #4

I got it working by passing in the module settings in the template, as
the documentation mentioned :slight_smile: Thanks for you help.

On Aug 31, 3:07 pm, Shay Banon kim...@gmail.com wrote:

You set it on the mapping level for webpage, which has no concept of
index_analyzer or search_analyzer. Do you want to set it as the default
analyzer for the index? If so, in your analysis config, simply configure a
"default" analyzer (name) that has the snowball properties.

On Wed, Aug 31, 2011 at 4:43 PM, Mattias Nordberg <

mattias.nordb...@gmail.com> wrote:

Hi,

I've got an index which has been configured to use the snowball
analyzer (English) as both index_analyzer and search_analyzer. The
problem is that it doesn't appear to be applied to any search queries,
but works perfectly at indexing time.

My analyzer in elasticsearch.json:
"index":{
"analysis":{
"analyzer":{
"snowball_en":{
"type":"snowball",
"language":"English"
}
}
}
}

Looking at _cluster/state, I've successfully configured a template
that will assign that analyzer to any new index ending in "_en".
"templates" : {
"english_index" : {
"template" : "*_en",
"order" : 0,
"settings" : {
},
"mappings" : {
"webpage" : {
"index_analyzer" : "snowball_en",
"search_analyzer" : "snowball_en"
}
}
}

Again in _cluster/state, the english index:
"indices" : {
"test_en" : {
"state" : "open",
"settings" : {
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1"
},
"mappings" : {
"webpage" : {
"_source" : {
"compress" : true
},
"dynamic_templates" : [ {
"everything_else" : {
"mapping" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string",
"match" : "*"
}
} ],
"analyzer" : "snowball_en",
"properties" : {
"id" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"title" : {
"include_in_all" : true,
"type" : "string"
},
"text" : {
"include_in_all" : true,
"type" : "string"
},
"language" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
},
"url" : {
"include_in_all" : true,
"index" : "not_analyzed",
"type" : "string"
},
"fields" : {
"dynamic" : "true",
"properties" : {
"tags" : {
"include_in_all" : false,
"index" : "not_analyzed",
"type" : "string"
}
}
}
}
}
},
"aliases" : [ ]
}
}
}

What I get when I run a test analysis against that index:
curl -XGET 'localhost:9200/test_en/_analyze' -d 'getting started'
{"tokens":[{"token":"getting","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"started","start_offset":
8,"end_offset":15,"type":"","position":2}]}

If I explicitly specify the analyzer:
curl -XGET 'localhost:9200/test_en/_analyze?analyzer=snowball_en' -d
'getting started'
{"tokens":[{"token":"get","start_offset":0,"end_offset":
7,"type":"","position":1},{"token":"start","start_offset":
8,"end_offset":15,"type":"","position":2}]}

My understanding was that specifying 'search_analyzer' would cause
elasticsearch to analyze the query string and in this case the two
statements above would return the same result?

Best regards
Mattias


(system) #5