Chinese and Japanese are hard to get right, but below I've included what you need to get the basics working. You will need to install the kuromoji plugin, the smartcn plugin and the icu plugin.
For search, you should use the smartcn
analyzer for chinese, and the kuromoji
analyzer for japanese. For aggregations, if you want to use the terms
aggregation, then you just need to set the field you want to aggregate on to be not_analyzed
. That way, it'll use the whole value of that field as the term.
The typeahead search is where things get trickier. You should use the completion suggester for both of them, with preserve_separators
set to false
and without fuzziness. These suggesters will need a custom analyzer for each language.
For Chinese, you need this:
PUT /chinese
{
"settings": {
"analysis": {
"filter": {
"pinyin": {
"type": "icu_transform",
"id": "Han-Latin"
}
},
"analyzer": {
"autocomplete": {
"tokenizer": "keyword",
"filter": [
"pinyin",
"lowercase",
"cjk_width"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"search_text": {
"type": "string",
"analyzer": "smartcn"
},
"aggs_text": {
"type": "string",
"index": "not_analyzed"
},
"suggest_text": {
"type": "completion",
"index_analyzer": "autocomplete",
"search_analyzer": "autocomplete",
"preserve_separators": false
}
}
}
}
}
And for Japanese, this:
PUT /japanese
{
"settings": {
"analysis": {
"filter": {
"romaji": {
"type": "kuromoji_readingform",
"use_romaji": true
}
},
"analyzer": {
"autocomplete": {
"tokenizer": "kuromoji",
"filter": [
"lowercase",
"cjk_width",
"romaji"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"search_text": {
"type": "string",
"analyzer": "kuromoji"
},
"aggs_text": {
"type": "string",
"index": "not_analyzed"
},
"suggest_text": {
"type": "completion",
"index_analyzer": "autocomplete",
"search_analyzer": "autocomplete",
"preserve_separators": false
}
}
}
}
}
Another issue that you may haven't encountered up until now is making autocomplete work in the browser. The problem is that eg Chinese users need to type several characters to produce a single pictogram, but the browser will only fire the keypress event once the whole pictogram has been entered. Really you want to intercept the keypresses earlier in the process.
This Stack Overflow question may help point you in the right direction: http://stackoverflow.com/questions/7316886/detecting-ime-input-before-enter-pressed-in-javascript