POST /index/type/1
{
"title": "Hello Elasticsearch"
}
I want to return the doc with exactly matched title but allowing redundant white spaces between words.
For example,
Searching "Hello[sapce]elasticsearch" and "Hello[sapce][sapce][sapce]Elasticsearch" will return the document. But "Hello" or "Elasticsearch" won't return document.
Any suggestion for what analyzer/tokenizer/filter I should use?
you can use a combination of the "pattern_replace" and "trim" token filters to remove redundant whitespaces in your input fields and also in the query. Unfortunately you cannot use those filters as "normalizers" for "keyword" fields, but you can define the field as a "text" field and use a "keyword_tokenizer" to get almost the same effect. Here is what I mean:
The pattern_replace filter should replace multiple whitespace characters by just one. It does this at index and query time for this field. So if you index:
PUT /index/type/1
{
"title" : "Elasticsearch In Action "
}
You should be able to query it with a different whitespace distribution as well:
POST /index/_search
{
"query": {
"match": {
"title": "Elasticsearch In Action "
}
}
}
You can check the token produced by the analysis only contains one whitespace between words:
GET /index/_analyze
{
"analyzer": "my_analyzer",
"text" : " Elasticsearch In Action "
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.