and I've submitted the following sentence for indexing {"parsedtext":
u"ฉันนั่งตากลม"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thai language has no spaces between
words).
Now, I can query for the string "tak" but I can't search for "taklom",
what am I missing?
Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers: https://issues.apache.org/jira/browse/LUCENE-503
Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.
I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?
Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers: [LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA
Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.
It took a while before I could figure out what's going on. It seems
that the tokenizer would come across a transliterated word and would
either prepend a word or append one to it, so I suppose I should start
digging into lucene then. Question is, if I fix the bug, then how can
I get the patch into elasticsearch for use asap?
I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?
Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers: [LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA
Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.
If you manage to fix it, you can either have your own analyzers build from
lucene and replace the lucene jar that comes with elasticsearch, or, create
your own custom analyzer that is registered with elasticsearch.
It took a while before I could figure out what's going on. It seems
that the tokenizer would come across a transliterated word and would
either prepend a word or append one to it, so I suppose I should start
digging into lucene then. Question is, if I fix the bug, then how can
I get the patch into elasticsearch for use asap?
I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when
used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?
Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers: [LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA
Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.