How get more accuracy search result when inputing chinese text directly?

Hi
Let me explain my question:
There is a document that include movie starring. for example:
{"index": {"_id": "2"}}
{ "starring": "邓光荣##区瑞强", "workName": "怒拔太阳旗", "length": "85:17"}

i set below mapping for starring filed:
curl -XPUT "http://localhost:9200/metadata/" -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"name_analyzer" : {
"tokenizer" : "name_tokenizer",
"filter" : ["full_pinyin_no_space","my_edge_ngram_tokenizer"]
}
},"tokenizer": {
"name_tokenizer": {
"type": "pattern",
"pattern": "##"
}
},
"filter" :{
"full_pinyin_no_space" : {
"type" : "pinyin",
"first_letter" : "none",
"keep_separate_first_letter": true,
"padding_char" : ""
},"my_edge_ngram_tokenizer" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "6",
"token_chars": [ "letter", "digit" ]
}
}
}
}
}'

curl -XPOST http://localhost:9200/metadata/movies/_mapping -d '
{
"properties": {
"starring":{
"type": "string",
"analyzer": "ik_max_word",
"fields": {
"pinyin":{
"type": "string",
"analyzer": "name_analyzer"
}
}
}
}
}'

so this process will return result when inputing "邓光荣" or "dgr" of first letter of "邓光荣" as search keywords.

my question is how to get more accuracy result that only include "邓光荣" if inputing chinese text "邓光荣" ?

when searching starring filed, whether i may set different analyzer dynamicly for every search ?

thanks very much~~

one way to do this given I understand your question correctly is to use an additional query clause to boost hits that fully match the query term. Your analyzer produces edge_ngram ie. 邓光荣 -> [邓, 邓光, 邓光荣] such that docs that have 邓光 will still match and might be scored better. One option would be to add a should clause to your query with a term query on that field that will not be analyzed at all like this:

"term" : { "starring" : "邓光荣" } 

that will give docs matching this name a boost compared to other hits on partial ngrams

hope that helps

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.