Ask for suggestion on what analyzer to use

Youxu · February 27, 2015, 9:40am

I have document like this:

{"Title":"This is my test title"}

This question was asked before, but my reply to the original post was never
accepted by the mailing list. So, sorry to re-ask same question again on a
new post.

I want to return the doc with exactly matched title, but allowing case
insensitive and redundant white space between words. That is, all of these
queries: "this is my test TITLE", "this is my test title" and "this is
my test title" will match the doc.

My initial idea is to define a custom analyzer with keyword tonenizer and
lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}

and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}

It works well for both "this is my test TITLE" and "this is my test title".
But does not work for redundant white spaces case like "this is my
test title".

How can I define my custom analyzer to achieve my goal?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3895e674-0434-4905-8279-5cdb4a7bc22d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · February 27, 2015, 9:42am

May be adding this to your filters? Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-trim-tokenfilter.html

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2015 à 10:40, Xudong You xudong.you@gmail.com a écrit :

I have document like this:

{"Title":"This is my test title"}

This question was asked before, but my reply to the original post was never accepted by the mailing list. So, sorry to re-ask same question again on a new post.

I want to return the doc with exactly matched title, but allowing case insensitive and redundant white space between words. That is, all of these queries: "this is my test TITLE", "this is my test title" and "this is my test title" will match the doc.

My initial idea is to define a custom analyzer with keyword tonenizer and lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}

and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}

It works well for both "this is my test TITLE" and "this is my test title". But does not work for redundant white spaces case like "this is my test title".

How can I define my custom analyzer to achieve my goal?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3895e674-0434-4905-8279-5cdb4a7bc22d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3895e674-0434-4905-8279-5cdb4a7bc22d%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/C18BBEEF-617C-493B-82AF-1BE92DC02F9A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Youxu · February 27, 2015, 10:32am

Thanks David.
But unfortunately, it does not work.
I tried with _analyze API, if the input is “this is test Title”,
the output is:
{"tokens":[{"token":"this is test
title","start_offset":0,"end_offset":30,"type":"word","position":1}]}

On Friday, February 27, 2015 at 5:43:10 PM UTC+8, David Pilato wrote:

May be adding this to your filters?
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 27 févr. 2015 à 10:40, Xudong You <xudon...@gmail.com <javascript:>> a
écrit :

I have document like this:

{"Title":"This is my test title"}

This question was asked before, but my reply to the original post was
never accepted by the mailing list. So, sorry to re-ask same question
again on a new post.

I want to return the doc with exactly matched title, but allowing case
insensitive and redundant white space between words. That is, all of these
queries: "this is my test TITLE", "this is my test title" and "this is
my test title" will match the doc.

My initial idea is to define a custom analyzer with keyword tonenizer and
lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}

and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}

It works well for both "this is my test TITLE" and "this is my test
title". But does not work for redundant white spaces case like "this is
my test title".

How can I define my custom analyzer to achieve my goal?

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3895e674-0434-4905-8279-5cdb4a7bc22d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3895e674-0434-4905-8279-5cdb4a7bc22d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/29a97423-5364-4875-ae59-1b5f4bcd12a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Ask for suggestion on what analyzer to use Elasticsearch	4	447	July 6, 2017
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4092	January 15, 2018
Whitespace tokenizer doesn't allow lowercase search? Elasticsearch	2	2992	October 4, 2017
Requesting help with Case-insensitive Analyzer Elasticsearch	3	319	March 27, 2024
Design custom analyzer with custom tokenizers Elasticsearch	3	971	July 5, 2017

Ask for suggestion on what analyzer to use

Related topics