Hello Nik.
Thanks for your advice.
I had just tried as you advice. But, I met an error as following.
"error": "IndexCreationException[[search] failed to create index]; nested:
CreationException[Guice creation errors:\n\n1) Could not find a suitable
constructor in org.apache.lucene.analysis.th.ThaiWordFilterFactory. Classes
must have either one (and only one) constructor annotated with @Inject or a
zero-argument constructor that is not private.\n at
org.apache.lucene.analysis.th.ThaiWordFilterFactory.class(Unknown Source)\n
at
org.elasticsearch.index.analysis.TokenFilterFactoryFactory.create(Unknown
Source)\n at
org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown
Source)\n at unknown\n\n1 error]; ",
In my opnion, this error raises by ThaiWordFilterFactory which has`t a
zeo-argument constructor. In fact, the ThaiWordFilterFactory has only a
following constructor.
/** Creates a new ThaiWordFilterFactory */
public ThaiWordFilterFactory(Map<String,String> args) {
super(args);
assureMatchVersion();
if (!args.isEmpty()) {
throw new IllegalArgumentException("Unknown parameters: " + args);
}
}
If you don`t mind, I have an one more question. Can I define a constructor
argument in above settings JSON.
2014년 2월 7일 금요일 오후 11시 17분 59초 UTC+9, Nikolas Everett 님의 말:
If you don't like the language analyzer you have to rebuild it as a custom
analyzer then add what you need to it.
{
"analyzer": {
"thai_with_ngram": {
"type": "custom",
"tokenizer": "standard",
"filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"]
}
},
"filter": {
"thai": {
"type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory"
},
"thai_stop": {
"type": "stop",
"stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt"
},
"ngram": { your ngram configuration here }
}
}
Builds it with your ngram configuration. I think. I'm taking quite a few
educated guesses here so I expect you to have to fiddle with it to get it
right.
How I did this:
- Open the class called ThaiAnalyzer in the Lucene version Elasticsearch
is using and find the method called createComponents. For me this is
simple because I have Elasticsearch open in Eclipse.
- That method defines the tokenizer (standard) and some filters
(standard, lowercase, ThaiWordFilter, and stop. You have to be able to
translate the class names to Elasticsearch's easier names to get this to
work properly.
- Now build it as a custom filter with your extra filter in there. That
is "thai_with_ngram" above.
- Next you'll need to define all the filters that don't exist by default
in Elasticsearch. In this case that is thai, thai_stop, and your ngram
filter. In order:
- The thai filter doesn't have an easy Elasticsearch mapping so you have
to tell Elasticsearch the class name to load. That class doesn't take an
configuration so we're done.
- The thai_stop filter is just a regular stop word filter with thai stop
words. But Elasticserach doesn't have an easy name to reference the thai
stop words file. That isn't too bad, as you can load the stopwords file
from the classepath. It lives in Lucene at the path I added above.
- The ngram filter is yours to build but it is well documented.
That took longer then I expected but it was worth the exercise so I'll
remember how to do it again when I need it. For reference, I do it for
English which has more filters but they all have easy names.
Nik
On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <mins...@gmail.com <javascript:>>wrote:
Hi folks.
I would like to develop for a searching system for Thai language.
First of all, I found Thai analyzer and it seemed like good.
Actually, but, It doesn`t meet my whole requirement.
I decided what extends it.
For example, I would like to add nGram token filter on the Thai analyzer
without any changes on it.
How to do this?
Please, give me some advice.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc05b477-2673-4d41-b611-96874005e379%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.