Hi,
Please help me understand n-gram and wildcard field type behaviors. I am working on an application that offers a search by phone number. It will be a contain search. e.g. search for phone number containing "234890".
Our Elasticsearch index has close to 1 billion documents.
While looking into options, I came across wildcard field type which seems to fit our use case. We haven't done any performance tests yet.
wildcard field type uses 3-gram - so I wanted to test 3-gram and wildcard, but I have hard time understanding why my example below does not match my expectations.
Here is the example I am using:
PUT ngram-index
{
"settings": {
"index": {
"number_of_shards": "2",
"number_of_replicas": "1"
},
"analysis": {
"analyzer": {
"ngram": {
"tokenizer": "ngram"
}
},
"tokenizer": {
"ngram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
}
}
},
"mappings": {
"dynamic": "strict",
"properties": {
"ngram_field": {
"type": "text",
"analyzer": "ngram"
},
"wildcard_field": {
"type": "wildcard",
"ignore_above": 25
}
}
}
}
PUT /ngram-index/_doc/1
{
"ngram_field": "1234567",
"wildcard_field": "1234567"
}
PUT /ngram-index/_doc/2
{
"ngram_field": "234890",
"wildcard_field": "234890"
}
When I ran a search using the wildcard field I get the right results per my expectations which is document id=2
POST /ngram-index/_search
{
"query": {
"bool": {
"filter": [
{
"wildcard": {
"wildcard_field": {
"value": "23489*"
}
}
}
]
}
}
}
however when I search using 3-gram - I get no results. I was expecting both documents id=1 and id=2 to be returned because the 3-gram "234" exist in both documents.
can you please help me understand this behavior:
POST /ngram-index/_search
{
"query": {
"bool": {
"filter": [
{
"wildcard": {
"ngram_field": {
"value": "23489*"
}
}
}
]
}
}
}
Thanks,
Moulay