Ask for suggestion on what analyzer to use

Youxu · February 9, 2015, 5:37am

I have document like this:

{"Title":"This is my test title"}

I want to return the doc with exactly matched title, but allowing case
insensitive and redundant white space between words. That is, all of these
queries: "this is my test TITLE", "this is my test title" and "this is
my test title" will match the doc.

My initial idea is to define a custom analyzer with keyword tonenizer and
lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}

and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}

It works well for both "this is my test TITLE" and "this is my test title".
But does not work for redundant whitespaces case like "this is my
test title".

How can I define my cusom analyzer to achieve my goal?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb146ea8-30d6-46c4-8f5b-d7412249a900%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · February 9, 2015, 5:54am

May be keep the default analyzer but use a phrase search?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 9 févr. 2015 à 06:37, Xudong You xudong.you@gmail.com a écrit :

I have document like this:

{"Title":"This is my test title"}

I want to return the doc with exactly matched title, but allowing case insensitive and redundant white space between words. That is, all of these queries: "this is my test TITLE", "this is my test title" and "this is my test title" will match the doc.

My initial idea is to define a custom analyzer with keyword tonenizer and lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}

and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}

It works well for both "this is my test TITLE" and "this is my test title". But does not work for redundant whitespaces case like "this is my test title".

How can I define my cusom analyzer to achieve my goal?

Thanks!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb146ea8-30d6-46c4-8f5b-d7412249a900%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/237D6305-79CC-4147-B11C-EF7FBE3C1A78%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Youxu1 · February 9, 2015, 6:10am

Thanks reply!
Yes match_phrase with default analyzer works somehow.
But I would like to optimize it with a better solution that

Just index the title field with one single token, instead of multiple tokens with standard analyzer.
only match the doc when input query contains all words of title, that is, if search "this is my test", the doc won't match.

Youxu1 · February 10, 2015, 2:54am

Any one has good suggestions?

Topic		Replies	Views
Ask for suggestion on what analyzer to use Elasticsearch	3	390	July 6, 2017
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4092	January 15, 2018
Whitespace tokenizer doesn't allow lowercase search? Elasticsearch	2	2992	October 4, 2017
Requesting help with Case-insensitive Analyzer Elasticsearch	3	319	March 27, 2024
Case-insensitive term query Elasticsearch	3	2950	January 20, 2017

Ask for suggestion on what analyzer to use

Thanks!

Related topics