I want to return the doc with exactly matched title, but allowing case
insensitive and redundant white space between words. That is, all of these
queries: "this is my test TITLE", "this is my test title" and "this is
my test title" will match the doc.
My initial idea is to define a custom analyzer with keyword tonenizer and
lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}
and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}
It works well for both "this is my test TITLE" and "this is my test title".
But does not work for redundant whitespaces case like "this is my
test title".
How can I define my cusom analyzer to achieve my goal?
I want to return the doc with exactly matched title, but allowing case insensitive and redundant white space between words. That is, all of these queries: "this is my test TITLE", "this is my test title" and "this is my test title" will match the doc.
My initial idea is to define a custom analyzer with keyword tonenizer and lowercase filter:
"settings" : {
"analysis" : {
"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : "lowercase"
}
}
}
}
and use lowercase_keyword analyzer for title filed:
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}
It works well for both "this is my test TITLE" and "this is my test title". But does not work for redundant whitespaces case like "this is my test title".
How can I define my cusom analyzer to achieve my goal?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.