Issue with using Thai Language Analyzer

Mishari · September 19, 2011, 3:47am

Hi,

I'm having an issue with the Thai Language Analyzer I have the
following mapping defined:

{ u'parsedtext': { 'index': 'analyzed', 'store': 'yes', 'type':
u'string', 'index_analyzer': u'thai', 'search_analyzer': u'thai' } }

and I've submitted the following sentence for indexing {"parsedtext":
u"ฉันนั่งตากลม"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thai language has no spaces between
words).

Now, I can query for the string "tak" but I can't search for "taklom",
what am I missing?

ppearcy · September 20, 2011, 10:11pm

Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers:
https://issues.apache.org/jira/browse/LUCENE-503

Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.

Hope this is at least marginally helpful

Best Regards,
Paul

On Sep 18, 9:47 pm, Mishari misha...@gmail.com wrote:

Hi,

I'm having an issue with theThaiLanguage Analyzer I have the
following mapping defined:

{ u'parsedtext': { 'index': 'analyzed', 'store': 'yes', 'type':
u'string', 'index_analyzer': u'thai', 'search_analyzer': u'thai' } }

and I've submitted the following sentence for indexing {"parsedtext":
u"©Ñ¹¹Ñè§µÒ¡ÅÁ"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thailanguage has no spaces between
words).

Now, I can query for the string "tak" but I can't search for "taklom",
what am I missing?

kimchy · September 20, 2011, 11:15pm

I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?

On Wed, Sep 21, 2011 at 1:11 AM, ppearcy ppearcy@gmail.com wrote:

Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers:
[LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA

Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.

Hope this is at least marginally helpful

Best Regards,
Paul

On Sep 18, 9:47 pm, Mishari misha...@gmail.com wrote:

Hi,

I'm having an issue with theThaiLanguage Analyzer I have the
following mapping defined:

{ u'parsedtext': { 'index': 'analyzed', 'store': 'yes', 'type':
u'string', 'index_analyzer': u'thai', 'search_analyzer': u'thai' } }

and I've submitted the following sentence for indexing {"parsedtext":
u"©Ñ¹¹Ñè§µÒ¡ÅÁ"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thailanguage has no spaces between
words).

Now, I can query for the string "tak" but I can't search for "taklom",
what am I missing?

Mishari · October 18, 2011, 6:27pm

Hi,

It took a while before I could figure out what's going on. It seems
that the tokenizer would come across a transliterated word and would
either prepend a word or append one to it, so I suppose I should start
digging into lucene then. Question is, if I fix the bug, then how can
I get the patch into elasticsearch for use asap?

On Sep 21, 6:15 am, Shay Banon kim...@gmail.com wrote:

I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?

On Wed, Sep 21, 2011 at 1:11 AM, ppearcy ppea...@gmail.com wrote:

Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers:
[LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA

Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.

Hope this is at least marginally helpful

Best Regards,
Paul

On Sep 18, 9:47 pm, Mishari misha...@gmail.com wrote:

Hi,

I'm having an issue with theThaiLanguage Analyzer I have the
following mapping defined:

{ u'parsedtext': { 'index': 'analyzed', 'store': 'yes', 'type':
u'string', 'index_analyzer': u'thai', 'search_analyzer': u'thai' } }

and I've submitted the following sentence for indexing {"parsedtext":
u"©Ñ¹¹Ñè§µÒ¡ÅÁ"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thailanguage has no spaces between
words).

Now, I can query for the string "tak" but I can't search for "taklom",
what am I missing?

kimchy · October 18, 2011, 9:15pm

If you manage to fix it, you can either have your own analyzers build from
lucene and replace the lucene jar that comes with elasticsearch, or, create
your own custom analyzer that is registered with elasticsearch.

On Tue, Oct 18, 2011 at 8:27 PM, Mishari misharim@gmail.com wrote:

Hi,

It took a while before I could figure out what's going on. It seems
that the tokenizer would come across a transliterated word and would
either prepend a word or append one to it, so I suppose I should start
digging into lucene then. Question is, if I fix the bug, then how can
I get the patch into elasticsearch for use asap?

On Sep 21, 6:15 am, Shay Banon kim...@gmail.com wrote:

I had a chat with Mishari on IRC, and something is strange since the
behavior from the stock Lucene ThaiAnalyzer was not as he expected when
used
with elasticsearch (though really, it just delegates to it). Mishari, any
updates?

On Wed, Sep 21, 2011 at 1:11 AM, ppearcy ppea...@gmail.com wrote:

Hey,
This is a pretty lame answer, but I believe all this functionality
is inherited directly from lucene. You may be able to get some more
detailed answers on that discussion group. These ticket seems to track
the ticket introducing this support, which may have some answers:
[LUCENE-503] Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene - ASF JIRA

Again, this isn't the answer I'd prefer to give, but if none of these
yields much info, I'd recommend digging into the code.

Hope this is at least marginally helpful

Best Regards,
Paul

On Sep 18, 9:47 pm, Mishari misha...@gmail.com wrote:

Hi,

I'm having an issue with theThaiLanguage Analyzer I have the
following mapping defined:

{ u'parsedtext': { 'index': 'analyzed', 'store': 'yes', 'type':
u'string', 'index_analyzer': u'thai', 'search_analyzer': u'thai' } }

and I've submitted the following sentence for indexing {"parsedtext":
u"©Ñ¹¹Ñè§µÒ¡ÅÁ"} but for the benefit for those in the forum I will
romanize it to "channangtaklom" (Thailanguage has no spaces between
words).

Now, I can query for the string "tak" but I can't search for
"taklom",
what am I missing?

Topic		Replies	Views
Extending Thai analyzer Elasticsearch	5	1036	July 6, 2017
Cjk and thai analyzer customization Elasticsearch	4	696	July 6, 2017
Extending based on Thai language analyzer Elasticsearch	3	1081	July 6, 2017
Phrases with special characters Elasticsearch	1	1386	July 6, 2017
Search with whitespace again Elasticsearch	3	5241	July 6, 2017

Issue with using Thai Language Analyzer

Related topics