I saw significant performance difference between searching on field with exact field name and searching on field with wildcard suffix i name.
Following is my mapping:
PUT /myindex
{
"mappings": {
"docs": {
"Locale": {
"type": "text",
"fields": {
"lowercaseraw": {
"type": "text",
"analyzer": "lowercase_keyword"
}
},
"analyzer": "standard"
},
"Title": {
"type": "text",
"analyzer": "standard"
},
"Title_english": {
"type": "text",
"analyzer": "english"
},
"Title_french": {
"type": "text",
"analyzer": "french"
},
"Title_finnish": {
"type": "text",
"analyzer": "finnish"
}
}
}
Note: There are totally around 50 Title_ fields each of which use a specific ES language analyzer. I just list a few as example.*
When ingesting document, I will insert the doc with TWO Title, one is for "Title", the other is for "Title_xxx", "xxx" will be determined by the language of the doc.
For example, if the doc is english, then:
POST /myindex/docs/doc_1
{
"Title": "Elasticsearch 5.0 is released!",
"Title_english": "Elasticsearch 5.0 is released!",
"Locale": "en-us"
}
If the doc is french, then:
POST /myindex/docs/doc_1
{
"Title": "Elasticsearch 5.0 is released!",
"Title_french": "Elasticsearch 5.0 is released!",
"Locale": "fr-fr"
}
Similar to other languages.
So, although there are 50 around Title_* fields in mapping, there is always TWO Title fields has value.
Then I saw significant query performance difference when searching english documents with following two different query DSL:
{
"query" : {
"bool": {
"must": {
"multi_match" : {
"query" : "elasticserach 5",
"fields" : ["Title_english"],
"type" : "phrase",
"slop" : 5,
"boost" : 10
}
},
"filter" : {
"bool" : {
"must" :
{
"match" : {
"Locale.lowercaseraw" : "en-us"
}
}
}
}
}
}
}
{
"query" : {
"bool": {
"must": {
"multi_match" : {
"query" : "elasticsearch 5",
"fields" : ["Title_*"],
"type" : "phrase",
"slop" : 5,
"boost" : 10
}
},
"filter" : {
"bool" : {
"must" :
{
"match" : {
"Locale.lowercaseraw" : "en-us"
}
}
}
}
}
}
}
The only difference is, that the #1 DSL use "Title_english", the #2 use "Title_", and please be noticed that the filter makes sure only english documents match the query, that is, only Title and Title_english in the matched documents have value, all other Title_ is null.
Per my rough perf testing using JMeter (200 QPS), the perf result is as follows:
#1:
Avg. CPU usage: 33%
Avg. Response Time: 26 ms
#2
Avg. CPU usage: 67%
Avg. Response Time: 50 ms
Although #2 use "Title_*", the ES actually search on one field, "Title_english", since the filter makes sure only "Title_english" has value.
So why so big difference? Is it bad practice using wildcard suffix in search field name?