ICU Analysers for Elastic search

(Akhil Suresh) #1

Do Elastic search support a korean language Analyser? Need help on that

(Igor Kupczyński) #2

Hi @Akhil_Suresh,

When I worked at Egnyte we where able to tokenize Korean using ICU Tokenizer. Please take a look at this blog post

In general ICU will let you tokenize langauges where words are not space delimited (like Korean) and will fold national character to their ascii versions (like in French or Polish, é --> e).

Hope this helps.


(Akhil Suresh) #3

Thanks @igor_k for the response.

This is how i used the language analyzer. I am not able to query out all korean words. Some of them are ok. Please help if any modifications required.

  analysis: {
    char_filter: {
      hyphen_mapping: {
        type: "mapping",
        mappings: [
    filter: {
      korean_collation: {
        type: "icu_collation",
        language: "ko",
        country: "KR",
        decomposition: "canonical"
    analyzer: {
      custom_with_char_filter: {
        tokenizer: "standard",
        char_filter: [
        filter: ["standard", "lowercase", "stop", "porter_stem"]
      korean: {
        tokenizer: "icu_tokenizer",
        char_filter: [
         filter: ["icu_normalizer", "lowercase", "stop", "porter_stem", "korean_collation"]

mappings: {
  document: {
    properties: {

(Igor Kupczyński) #4

Hi, I never tried to stem Korean words. I think the issue is in your pipeline of filter. You have porter_stem, but its web page suggests it is english-only stemmer.

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English.

Try removing it. Also, you can start simple, with icu_tokenizer and icu_folding and see where that will lead you. For example if you use folding you do not need to use lowercase filter.

You can start with this example and try your Korean searches there (I do not know Korean, so it is hard for me to give more than a general tips). And then you can build it up if you need more fancy features.

Hope this helps,

(Akhil Suresh) #5

Thanks @igor_k Partial text search for Korean text is not working . For eg: if we search "에프알엘코리아" we will get 100 results but if we search "에프알" i am not getting any results. This text belong to a field name "sections". Do i need to add any particular analyzer for this particular field to enable partial text search? Please help

(system) #6