Hi,
I have some problems to configure my ES index.
I want to perform a case-insensitive, accent-insensitive with special charachters search.
Here is the index creation script :
DELETE index-test
PUT index-test
{
"settings": {
"analysis": {
"analyzer": {
"SearchAnalyzer": {
"type": "custom",
"filter": ["lowercase", "asciifolding"],
"tokenizer": "whitespace"
},
"IndexAnalyzer": {
"type": "custom",
"filter": ["lowercase", "asciifolding"],
"tokenizer": "whitespace"
}
}
}
},
"mappings": {
"myType": {
"properties": {
"myField": {
"type": "text",
"analyzer": "IndexAnalyzer",
"search_analyzer": "SearchAnalyzer"
}
}
}
}
}
PUT index-test/myType/1
{
"myField": "aB-é"
}
Consider this search request :
GET index-test/myType/_search
{
"query": {
"bool": {
"filter": [{
"wildcard": {
"myField": {
"value": "*{charToSearch}*"
}
}
}]
}
}
}
It works as expected if you replace {charToSearch} with these characters : "a", "A", "b", "B", "-", "e".
Unfortunately, that does not work with "é".
It's like the searchAnalyzer filter "asciifolding" is not performed.
EDIT : sorry for the poor formating of my first message and for the delay
Thanks for any help, Valentin.
dadoonet
(David Pilato)
December 22, 2017, 5:11pm
2
Please format your code using </>
icon as explained in this guide . It will make your post more readable.
Or use markdown style like:
```
CODE
```
Could you provide a full recreation script as described in
The heart of the free and open Elastic Stack
Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
PLEASE READ THIS SECTION IF IT'S YOUR FIRST POST
Some useful links:
elasticsearch reference guide
elasticsearch user guide
elasticsearch plugins
elasticsearch cl…
It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.
Hi,
I added a recreation script and formatted the code.
Thanks for any help !
dadoonet
(David Pilato)
January 2, 2018, 9:37am
4
I didn't see that..
An alternative could be to perform the "asciifolding" on the client side.
Do you have a better idea to perform this search ?
val
(Val Crettaz)
January 2, 2018, 10:32am
6
Do not use a wildcard query but include an ngram token filter in IndexAnalyzer
instead so that your text is sliced and diced into smaller text chunks all while being ascii-folded. No need to change the SearchAnalyzer
though.
1 Like
Thanks for the suggestion, i will give it a try.
The searchAnalyzer seems to work like I wanted with the ngram token filter.
I need to search for long strings (a GUID for example -> c163e2b5-5362-e556-490a-867a9cd63bc3), which is 36 charachters.
What is the max "max_gram" value to not exceed to avoid performance problem ?
Is there a better solution than nGram to perform "contains query" for big values ?
system
(system)
Closed
January 30, 2018, 12:49pm
9
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.