Can I add additional filters to non custom type analyzer?

Hi All,

Sorry for the noob questions here. I am new to elastic search.

I want my analyzer to support query string query in both Chinese and
English.

The cjk analyzer works fine for me. It tokenizes the query string
correctly.
But I would like to add additional filters for this analyzer(e.g. kstem,
asciifolding...etc)

My elastic search version is 0.19.10(not going to upgrade it currently).
It does not supports the token filters "cjk_width" and "cjk_bigram".
I cannot create a custom cjk tokenizers for now.

Also, I do not want to use the Combo Analyzer Plugin.

Is there any way that I can add addtional filters to the cjk analyzer? or
is there any alternative ways for my situation?

Many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You would need to recreate the analyzer yourself using the existing
tokenizers/filters.

Here is a brief discussion on the topic:
https://groups.google.com/forum/#!msg/elasticsearch/GV67Dw4Afcc/Pd84RfMeC-AJ

Elasticsearch 0.19 uses Lucene 3.5, so the first suggestion should work.

Cheers,

Ivan

On Thu, Mar 14, 2013 at 2:59 AM, Hui dannyhui1103@gmail.com wrote:

Hi All,

Sorry for the noob questions here. I am new to Elasticsearch.

I want my analyzer to support query string query in both Chinese and
English.

The cjk analyzer works fine for me. It tokenizes the query string
correctly.
But I would like to add additional filters for this analyzer(e.g. kstem,
asciifolding...etc)

My Elasticsearch version is 0.19.10(not going to upgrade it currently).
It does not supports the token filters "cjk_width" and "cjk_bigram".
I cannot create a custom cjk tokenizers for now.

Also, I do not want to use the Combo Analyzer Plugin.

Is there any way that I can add addtional filters to the cjk analyzer? or
is there any alternative ways for my situation?

Many thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

Thanks for your reply and reference.

I am fine to re create the analyzer myself using the existing
tokenizers/filters .

But I found that there is no cjk tokenizer in the Elasticsearch analysis
tokenizer section.
Elastic search tokenizer section ref :

"analyzer" : {

  • "default" : {*
  • "type" : "custom",*
  • "tokenizer" : "cjk",
    "filter" : ["kstem"]
    }
    }*

For analyzers, there is a cjk analyzer but I cannot add additional
tokenizer filters such as kstem.
"analyzer" : {

  • "default" : {*
  • "type" : "cjk",*
  • "filter" : ["kstem"]
    
    }
    }*

*For tokenizer filters, the *"cjk_width" and"cjk_bigram" filters is not
exposed in Elasticsearch 0.19.

Am I wrong in the analyzer customization?

Thanks.

On Friday, March 15, 2013 2:45:19 AM UTC+8, Ivan Brusic wrote:

You would need to recreate the analyzer yourself using the existing
tokenizers/filters.

Here is a brief discussion on the topic:
Redirecting to Google Groups

Elasticsearch 0.19 uses Lucene 3.5, so the first suggestion should work.

Cheers,

Ivan

On Thu, Mar 14, 2013 at 2:59 AM, Hui <dannyh...@gmail.com <javascript:>>wrote:

Hi All,

Sorry for the noob questions here. I am new to Elasticsearch.

I want my analyzer to support query string query in both Chinese and
English.

The cjk analyzer works fine for me. It tokenizes the query string
correctly.
But I would like to add additional filters for this analyzer(e.g. kstem,
asciifolding...etc)

My Elasticsearch version is 0.19.10(not going to upgrade it currently).
It does not supports the token filters "cjk_width" and "cjk_bigram".
I cannot create a custom cjk tokenizers for now.

Also, I do not want to use the Combo Analyzer Plugin.

Is there any way that I can add addtional filters to the cjk analyzer? or
is there any alternative ways for my situation?

Many thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

CJKWidthFilter was introduced in Lucene 3.6, so it is not available in
Elasticsearch 0.19

Elasticsearch's documentation is not versioned (not even in source
control), so you cannot find documentation for 0.19 online anymore.
CJKTokenizer should be included with ES 0.19.

--
Ivan

On Thu, Mar 14, 2013 at 6:29 PM, Hui dannyhui1103@gmail.com wrote:

Hi Ivan,

Thanks for your reply and reference.

I am fine to re create the analyzer myself using the existing
tokenizers/filters .

But I found that there is no cjk tokenizer in the Elasticsearch analysis
tokenizer section.
Elastic search tokenizer section ref :
Elasticsearch Platform — Find real-time answers at scale | Elastic
"analyzer" : {

  • "default" : {*
  • "type" : "custom",*
  • "tokenizer" : "cjk",
    "filter" : ["kstem"]
    }
    }*

For analyzers, there is a cjk analyzer but I cannot add additional
tokenizer filters such as kstem.
"analyzer" : {

  • "default" : {*
  • "type" : "cjk",*
  • "filter" : ["kstem"]
    
    }
    }*

*For tokenizer filters, the *"cjk_width" and"cjk_bigram" filters is not
exposed in Elasticsearch 0.19.

Am I wrong in the analyzer customization?

Thanks.

On Friday, March 15, 2013 2:45:19 AM UTC+8, Ivan Brusic wrote:

You would need to recreate the analyzer yourself using the existing
tokenizers/filters.

Here is a brief discussion on the topic: https://groups.google.**
com/forum/#!msg/elasticsearch/**GV67Dw4Afcc/Pd84RfMeC-AJhttps://groups.google.com/forum/#!msg/elasticsearch/GV67Dw4Afcc/Pd84RfMeC-AJ

Elasticsearch 0.19 uses Lucene 3.5, so the first suggestion should work.

Cheers,

Ivan

On Thu, Mar 14, 2013 at 2:59 AM, Hui dannyh...@gmail.com wrote:

Hi All,

Sorry for the noob questions here. I am new to Elasticsearch.

I want my analyzer to support query string query in both Chinese and
English.

The cjk analyzer works fine for me. It tokenizes the query string
correctly.
But I would like to add additional filters for this analyzer(e.g. kstem,
asciifolding...etc)

My Elasticsearch version is 0.19.10(not going to upgrade it currently).
It does not supports the token filters "cjk_width" and "cjk_bigram".
I cannot create a custom cjk tokenizers for now.

Also, I do not want to use the Combo Analyzer Plugin.

Is there any way that I can add addtional filters to the cjk analyzer?
or is there any alternative ways for my situation?

Many thanks!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.