ES 2.4.1 and Kuromoji plugin with specify filed in search query


(Tinh Huynh) #1
I've just used ElaticSearch(version 2.4.1) in my project for 2 weeks ago, and I have a problem if I specify field in the query string.
I want to use Kuromoji plugin and n-gram tokenizer to search Japanese data.
1/In my query, if I don't specify the field (for example: "Content"), I receive 2 records in the result.
{
    "query" : {
        "bool" : {
            "must": {
                "query_string":{
                "query":"Software"
                /*,"fields":["Content"] <-- not specify this field*/
            }
        }
    }
}
}
2/But when I use the field "Content" in above query, the result has no record.
(In my project, I want to search on "Content" field)
3/I also use the attribute "highlight" in step 1, but the result does'nt contain "highlight" block
    {...
      "highlight":{
         "pre_tags" : ["&lt;tag1&gt;"],
         "post_tags" : ["&lt;/tag1&gt;"],
         "fields" : {
            "&ast;" : {} /&ast; or use "_all" */
         }
      }
    }
I want to ask: in step 2 (above),what field is specified in the query string? (such as "product.Content" or some thing one else?)
If I don't use Kuromoji plugin, the result of query in step 2 has 2 records.
So I think the Kuromoji plugin is related to the result.
Can anybody help me with this problem?

Here is my mappings and config in yaml:
+My mappings:
    {....
      "mappings" : {
        "product" : {
           "properties" : {
              "Content" : {
              	"index": "not_analyzed",
                "search_analyzer": "ja",
              	"analyzer": "ja",
              	"type": "string",
              	"store": true
              }
             ....
           }
        }
      }
    }

+Config in file elasticsearch.yml:
    index :
      analysis :
        analyzer :
          ja :
            type : custom
            tokenizer : ja_tokenizer
            char_filter : [
              html_strip,
              kuromoji_iteration_mark
            ]
            filter : [
              lowercase,
              cjk_width,
              katakana_stemmer,
              kuromoji_part_of_speech
            ]
          ja_ngram :
            type : custom
            tokenizer : ngram_ja_tokenizer
            char_filter : [html_strip]
            filter : [
              cjk_width,
              lowercase
            ]
        tokenizer :
          ja_tokenizer :
            type : kuromoji_tokenizer
            mode : search
            user_dictionary : userdict_ja.txt
          ngram_ja_tokenizer :
            type : nGram
            min_gram : 2
            max_gram : 3
            token_chars : [letter, digit]
        filter :
          katakana_stemmer :
            type : kuromoji_stemmer

(Jun Ohtani) #2

Hi @tinhhp ,
Could you please use </> for formatting your query and mappings.

You should remove "index": "not_analyzed" from Content field.
See : https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html


(Tinh Huynh) #3

Thanks Ohtani !
It's resolve my problem.


(system) #4