I've added a custom analyzer that uses asciifolding filter as follows:
index:
analysis:
analyzer:
eulang:
type: custom
tokenizer: standard
filter: [standard, lowercase, asciifolding, stop]
I then created a new index with the following mapping (so that title,
notes, and tags fields use the eulang analyzer):
{
"place" : {
"_all" : {enabled: false},
"properties" : {
"user_id" : {"type" : "integer", "index" :
"not_analyzed"},
"title" : {"type" : "string", "boost" : 1.5, "analyzer" :
"eulang"},
"notes" : {"type" : "string", "analyzer" : "eulang"},
"tags" : {"type" : "string", "index_name" : "tag",
"boost" : 1.5, "analyzer" : "eulang"},
"created_on" : {"type" : "date", "format" : "YYYY-MM-DD
HH:mm:ss"}
}
}
}'
After inserting a few documents and querying I see that the ASCII
folding works at index time, but not at query time for some reason:
$ curl 'http://localhost:9200/places_2010091901/_search?
q=notes:café' {"_shards":{"total":1,"successful":1,"failed":0},"hits":
{"total":0,"max_score":null,"hits":[]}}
$ curl 'http://localhost:9200/places_2010091901/_search?q=notes:cafe'
{"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":
1,"max_score":0.5,"hits":
[{"_index":"places_2010091901","_type":"place","_id":"1","_score":0.5,
"_source" : {"notes": "Café and event space. À côté.", "title":
"Watershed", "user_id": 11, "created_on": "2010-09-19 11:01:19",
"tags": ["coffee", "wifi", "view"]}}]}}
I was under the impression that the asciifolding filter would be
applied to the query string as well..