Extending based on Thai language analyzer


(Min Cha) #1

Hi folks.

I would like to develop for a searching system for Thai language.
First of all, I found Thai analyzer and it seems like good.

But It doesn`t meet my whole requirement so I would like to extends it.
For example, I would like to add nGram token filter on the Thai analyzer.

How to do this?
Please, give me some advice.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32a6dc5c-b3c2-43ab-b5c1-ef8a54f17c11%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

you can easily create custom analyzers, which use the thai analyzer and the
ngram token filter. See

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html

--Alex

On Fri, Feb 7, 2014 at 6:57 AM, Min Cha minslovey@gmail.com wrote:

Hi folks.

I would like to develop for a searching system for Thai language.
First of all, I found Thai analyzer and it seems like good.

But It doesn`t meet my whole requirement so I would like to extends it.
For example, I would like to add nGram token filter on the Thai analyzer.

How to do this?
Please, give me some advice.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32a6dc5c-b3c2-43ab-b5c1-ef8a54f17c11%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_bzxS7M1XRJa5Obi8FuyxxjoMQP4FkJCf6pwWWXYRPew%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Min Cha) #3

OK. I also know about custom analyzer.

But ThaiWordFilter is not aliased so I cant use it. Alternatively, I can use following. But ThaiWordFilterFactory doesnt have a zero-argument constructor so when
I use a following setting, an error occurs.

If I know how to use a factory which has an one-argument constructor or to
extend based on ThaiAnalyzer, I can resolve this problem.

{
"analyzer": {
"thai_with_ngram": {
"type": "custom",
"tokenizer": "standard",
"filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"]
}
},
"filter": {
"thai": {
"type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory" <-- An
error occurs because of lack of a zero-argument constructor.
},
"thai_stop": {
"type": "stop",
"stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt"
},
"ngram": { your ngram configuration here }
}
}

2014년 2월 10일 월요일 오후 9시 45분 36초 UTC+9, Alexander Reelsen 님의 말:

Hey,

you can easily create custom analyzers, which use the thai analyzer and
the ngram token filter. See

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html

--Alex

On Fri, Feb 7, 2014 at 6:57 AM, Min Cha <mins...@gmail.com <javascript:>>wrote:

Hi folks.

I would like to develop for a searching system for Thai language.
First of all, I found Thai analyzer and it seems like good.

But It doesn`t meet my whole requirement so I would like to extends it.
For example, I would like to add nGram token filter on the Thai analyzer.

How to do this?
Please, give me some advice.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32a6dc5c-b3c2-43ab-b5c1-ef8a54f17c11%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b02ae32-c7a8-4c9a-a6d3-48095bb84f97%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4