Dumb question- using the cjk analyzer

nathan_moore · September 22, 2011, 7:06pm

Hi everyone,

Sorry for the n00b post. I just started using ElasticSearch for a
dynamic website search. In English, everything just works great.
Love the json interface- was able to create an index easily, query it
easily, and integrate it into the website easily.

It went so well, that I've been asked to use ElasticSearch for a
Japanese text website search. Should be no problem, the documentation
says Lucene's "cjk" analyzer is supported.

So how do I do it? Everything's in utf8, I try to execute a query:
curl -XGET 'http://localhost:9200/MySite/search/_search?pretty=1&q=
\uff52\uff49\uff50'
This fails to return any results. Naturally- I need the cjk analyzer
to break out the Japanese characters. Cjk will do it:
curl 'localhost:9200/MySite/_analyze?pretty=1&analyzer=cjk' -d '
\uff52\uff49\uff50'
And this breaks out the tokens without a problem.

Doing what I thought was the obvious:
curl 'localhost:9200/MySite/search/_search?pretty=1&q=
\uff52\uff49\uff50' -d {
"settings":{
"analysis":{
"analyzer":"cjk"
}
}
}'

This gives an "unable to parse" error, so obviously, I've done
something really stupid with my syntax.

Hence my n00b question: what's wrong with my syntax? What dumb thing
have I done? Do I need to set some default elsewhere to use the cjk
analyzer?

Thanks in advance,

-Nathan

James_Cook · September 22, 2011, 7:26pm

Hi Nathan,

To override the default analyzer for a particular index, you would do this:

curl -XPUT 'http://localhost:9200/myindex' -d '
{
"settings": {

"analysis" : {

        "analyzer" : {
            "default" : {
                "type" : "cjk"

}

        }
    }
}

}'

If you want to do the same across all indicies, you can add the similar configuration to the elasticsearch.yml file under the 'index' settings.

nathan_moore · September 22, 2011, 8:26pm

Thank you so much! That was fast. Worked too

-Nathan

On Sep 22, 12:26 pm, James Cook jc...@tracermedia.com wrote:

Hi Nathan,

To override the default analyzer for a particular index, you would do this:

curl -XPUT 'http://localhost:9200/myindex'-d '
{
"settings": {

"analysis" : {
        "analyzer" : {
            "default" : {
                "type" : "cjk"
}
        }
    }
}
}'

If you want to do the same across all indicies, you can add the similar configuration to the elasticsearch.yml file under the 'index' settings.

Topic		Replies	Views
Asian characters and not words are tokenized - CJK Analysis and Tokenization Problems Elasticsearch	8	705	July 6, 2017
Can I add additional filters to non custom type analyzer? Elasticsearch	4	426	July 6, 2017
Chinese Language Analyzer or CJK Elasticsearch	1	382	July 6, 2017
[ANN] Elasticsearch Japanese (kuromoji) Analysis plugin 2.2.0 released Elasticsearch	1	441	July 6, 2017
Query_string is not behaving as expected with analyzer (simple)? Elasticsearch	13	932	November 2, 2017

Dumb question- using the cjk analyzer

Related topics