My application relies on elasticsearch to search for people using different attributes. The user enters someone's first name, last name or both and the ES client does a boolean query across a handful of fields to find matching terms.
My problem is that, the way I have it setup, It cannot find people with short names like "Mo Jo" - instead ES would return "Mo Johnson" even if there is a perfect match for "Mo Jo". It is almost like short strings are not indexed at all.
Here is my template:
{
"settings": {
"number_of_shards": 10,
"analysis": {
"analyzer": {
"exact": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"startswith": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"startswith_filter"
]
}
},
"filter": {
"startswith_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"people": {
"properties": {
"display_id": {
"type": "string",
"fields": {
"display_id": {
"type": "string",
"analyzer": "exact"
},
"_startswith": {
"type": "string",
"analyzer": "startswith",
"search_analyzer": "exact"
}
}
},
"email": {
"type": "string",
"fields": {
"email": {
"type": "string",
"analyzer": "exact"
},
"_startswith": {
"type": "string",
"analyzer": "startswith",
"search_analyzer": "exact"
}
}
},
"first_name": {
"type": "string",
"fields": {
"first_name": {
"type": "string",
"analyzer": "exact"
},
"_startswith": {
"type": "string",
"analyzer": "startswith",
"search_analyzer": "exact"
}
}
},
"last_name": {
"type": "string",
"fields": {
"last_name": {
"type": "string",
"analyzer": "exact"
},
"_startswith": {
"type": "string",
"analyzer": "startswith",
"search_analyzer": "exact"
}
}
},
"full_name": {
"type": "string",
"analyzer": "exact",
"doc_values": true
},
"phone": {
"type": "string",
"index": "no",
"doc_values": true
},
"external_id": {
"type": "string",
"index": "no",
"doc_values": true
}
}
}
}
}
Here is the query, where 'q' contains what the user entered.
{
"query": {
"bool": {
"should": [
{ "match": { "first_name": { "query": q }}},
{ "match": { "first_name._startswith": { "query": q }}},
{ "match": { "last_name": { "query": q }}},
{ "match": { "last_name._startswith": { "query": q }}},
{ "match": { "full_name": { "query": q }}},
{ "match": { "email": { "query": q }}},
{ "match": { "email._startswith": { "query": q }}},
{ "match": { "display_id": { "query": q }}},
{ "match": { "display_id._startswith": { "query": q }}}
],
"minimum_should_match": 1
}
}
}
I play quite a bit with different analyzers and filters but I can't never find the solutions that allows to search for short names.
Any pointers greatly appreciated.