What analyzer does query_string use for highlighting?


(Weiwei Wang) #1

I have mutiple-fields for search, but each field with different
search_analyzer. when do highlighting i found that the fragments is
not as expected.

for example, i have two fields: name, phone, and i have two analyzers
in my elasticsearch.json
"analysis" : {
"analyzer" : {
"nGramAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball","nGramFilter"]
},
"standardAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball"]
}
},
"filter":{
"nGramFilter":{
"type":"nGram",
"min_gram":1,
"max_gram":64
},
"edgeNGramFilter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":64,
"side":"front"
},
"englishSnowball":{
"type":"snowball",
"language":"English"
}
}

the mapping for the fields are:
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"nGramAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
},
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"standardAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
}

when i do query_string query like below:
curl '10.18.102.101:9201/pim/contact/_search?pretty=true' -d '{"from":
0,"size":2,"query":{"query_string":{"query":"18600","fields":
["name^5.0","phone^5.0"],"default_operator":"or","allow_leading_wildcard":false,"analyze_wildcard":true}},"filter":
{"bool":{"must":{"term":{"deleted":0}}}},"explain":false,"fields":
["name", "phone"],"highlight":{"pre_tags":["<span class="hl
">"],"post_tags":[""],"fields":{"name":{},"phone":{}}}}'

the highlight is show as:
18600044220</
em>

it seems the highlighter uses the nGramAnalyzer for highlighting, but
i expect it use the relevant search_analyzer to do hightlighting for
the field

any one do me a favor for this problem?

elasticsearch version 0.18.4


(Goog Jobs) #2

two mappings for the same filed? the lucene demands index_analyzer to
be same with "search_analyzer". 希望有用。

On Nov 30, 10:14 pm, Weiwei Wang ww.wang...@gmail.com wrote:

I have mutiple-fields for search, but each field with different
search_analyzer. when do highlighting i found that the fragments is
not as expected.

for example, i have two fields: name, phone, and i have two analyzers
in my elasticsearch.json
"analysis" : {
"analyzer" : {
"nGramAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball","nGramFilter"]
},
"standardAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball"]
}
},
"filter":{
"nGramFilter":{
"type":"nGram",
"min_gram":1,
"max_gram":64
},
"edgeNGramFilter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":64,
"side":"front"
},
"englishSnowball":{
"type":"snowball",
"language":"English"
}
}

the mapping for the fields are:
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"nGramAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
},
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"standardAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
}

when i do query_string query like below:
curl '10.18.102.101:9201/pim/contact/_search?pretty=true' -d '{"from":
0,"size":2,"query":{"query_string":{"query":"18600","fields":
["name^5.0","phone^5.0"],"default_operator":"or","allow_leading_wildcard":f alse,"analyze_wildcard":true}},"filter":
{"bool":{"must":{"term":{"deleted":0}}}},"explain":false,"fields":
["name", "phone"],"highlight":{"pre_tags":["<span class="hl
">"],"post_tags":[""],"fields":{"name":{},"phone":{}}}}'

the highlight is show as:
18600044220</
em>

it seems the highlighter uses the nGramAnalyzer for highlighting, but
i expect it use the relevant search_analyzer to do hightlighting for
the field

any one do me a favor for this problem?

elasticsearch version 0.18.4


(medcl.net) #3

because the the term positions AND offsets are generated and stored during
indexing , not the searching~

-----Original Message-----
From: Weiwei Wang
Sent: Wednesday, November 30, 2011 10:14 PM
To: elasticsearch
Subject: what analyzer does query_string use for highlighting?

I have mutiple-fields for search, but each field with different
search_analyzer. when do highlighting i found that the fragments is
not as expected.

for example, i have two fields: name, phone, and i have two analyzers
in my elasticsearch.json
"analysis" : {
"analyzer" : {
"nGramAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball","nGramFilter"]
},
"standardAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball"]
}
},
"filter":{
"nGramFilter":{
"type":"nGram",
"min_gram":1,
"max_gram":64
},
"edgeNGramFilter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":64,
"side":"front"
},
"englishSnowball":{
"type":"snowball",
"language":"English"
}
}

the mapping for the fields are:
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"nGramAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
},
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"standardAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
}

when i do query_string query like below:
curl '10.18.102.101:9201/pim/contact/_search?pretty=true' -d '{"from":
0,"size":2,"query":{"query_string":{"query":"18600","fields":
["name^5.0","phone^5.0"],"default_operator":"or","allow_leading_wildcard":false,"analyze_wildcard":true}},"filter":
{"bool":{"must":{"term":{"deleted":0}}}},"explain":false,"fields":
["name", "phone"],"highlight":{"pre_tags":["<span class="hl
">"],"post_tags":[""],"fields":{"name":{},"phone":{}}}}'

the highlight is show as:
18600044220</
em>

it seems the highlighter uses the nGramAnalyzer for highlighting, but
i expect it use the relevant search_analyzer to do hightlighting for
the field

any one do me a favor for this problem?

elasticsearch version 0.18.4


(Weiwei Wang) #4

thanks, but when i disable term_vector, everything will be ok

On Dec 2, 6:33 pm, medcl2...@gmail.com wrote:

because the the term positions AND offsets are generated and stored during
indexing , not the searching~

-----Original Message-----
From:WeiweiWang
Sent: Wednesday, November 30, 2011 10:14 PM
To: elasticsearch
Subject: what analyzer does query_string use for highlighting?

I have mutiple-fields for search, but each field with different
search_analyzer. when do highlighting i found that the fragments is
not as expected.

for example, i have two fields: name, phone, and i have two analyzers
in my elasticsearch.json
"analysis" : {
"analyzer" : {
"nGramAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball","nGramFilter"]
},
"standardAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":
["standard","lowercase","englishSnowball"]
}
},
"filter":{
"nGramFilter":{
"type":"nGram",
"min_gram":1,
"max_gram":64
},
"edgeNGramFilter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":64,
"side":"front"
},
"englishSnowball":{
"type":"snowball",
"language":"English"
}
}

the mapping for the fields are:
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"nGramAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
},
"phone":{
"type" : "string",
"index": "analyzed",
"index_analyzer":"nGramAnalyzer",
"search_analyzer":"standardAnalyzer",
"store":"yes",
"term_vector":"with_positions_offsets"
}

when i do query_string query like below:
curl '10.18.102.101:9201/pim/contact/_search?pretty=true' -d '{"from":
0,"size":2,"query":{"query_string":{"query":"18600","fields":
["name^5.0","phone^5.0"],"default_operator":"or","allow_leading_wildcard":f alse,"analyze_wildcard":true}},"filter":
{"bool":{"must":{"term":{"deleted":0}}}},"explain":false,"fields":
["name", "phone"],"highlight":{"pre_tags":["<span class="hl
">"],"post_tags":[""],"fields":{"name":{},"phone":{}}}}'

the highlight is show as:
18600044220</
em>

it seems the highlighter uses the nGramAnalyzer for highlighting, but
i expect it use the relevant search_analyzer to do hightlighting for
the field

any one do me a favor for this problem?

elasticsearch version 0.18.4


(system) #5