Prevent standard tokenizer from tokenizing <IDEOGRAPHIC> token per character

Hello, is it possible to use N-Gram Filter on each <IDEOGRAPHIC> phrase just like how it works on <HANGUL>?

For example (min_gram=max_gram=2 & preserve_original):
Input: "我 爱 青苹果"
Desired Output: "我", "爱", "青苹", "苹果", and "青苹果"

  1. If I set ngram as tokenizer, it will include whitespace too:
    Setting:
{
          "type": "custom",
          "tokenizer": "ngram_tokenizer_2_2"
}

Output: "我<WHITESPACE>" (undesired), "<WHITESPACE>爱" (undesired), "爱<WHITESPACE>" (undesired),"<WHITESPACE>青" (undesired),"青苹", and "苹果"

  1. If I set ngram as token filter and use standard tokenizer, it will split "青苹果" per character instead:
    Setting:
{
          "type": "custom",
          "tokenizer": "standard"
          "filter": ["ngram_filter_2_2"]
}

Output: "我", "爱", "青" (undesired), "苹" (undesired), and "果" (undesired)

  1. If I set ngram as token filter and use whitespace tokenizer, it will return tokens as expected, but it will fail on multilanguage input such as "I love青苹果"
    Setting:
{
          "type": "custom",
          "tokenizer": "whitespace"
          "filter": ["ngram_filter_2_2"]
}

Output: "I", "lo", "ov", "ve", "e青" (undesired), "青苹", "苹果", and "love青苹果" (undesired)

Thank you in advance!

up up!