Understanding search-as-you-type Fields

Hello community,

I am using ES on my local machine with version of 8.10.4

I was experimenting with search-as-you-type lately and I am confused by "._2gram" and "._3gram" fields. I created a basic index as

PUT autosuggest_trial_2
{
  "mappings": {
    "properties": {
      "description": {
        "type": "search_as_you_type",
        "analyzer": "my_custom_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "my_ascii_folding": {
          "type": "asciifolding"
        }
      },
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "my_ascii_folding", "trim"]
        }
      }
    }
  }
}

Then I built a search query as:

GET autosuggest_trial_2/_search
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "for men shoe",
          "fields": [
            "description",
            "description._2gram",
            "description._3gram"
          ],
          "type": "bool_prefix",
          "operator": "AND"
        }
      }
    }
  }
}

The results are exactly what I wanted to see. However, when I remove "description._2gram" and "description._3gram" from fields in search query, the results are same. So, I am confused about why I give these fields. Also, some examples added "._index_prefix" as well.

When I add "profile": true to my search query, I can see that search query uses "description._2gram" and that's a relief. Maybe elasticsearch adds these fields automatically?

Is there an explanation as to why? Thanks.

Hi @safakkbilici,

Welcome to the community! The ._2gram and ._3gram fields are generated by default as the default max_shingle_size is 3 as per the documentation.

When you used the "profile" : true option with all fields included the query are you seeing that it's using both the 2 and 3 gram fields?

Hello @safakkbilici,

Indeed you will get same result. But this can match the query terms in any order, but it will score documents higher if they contain the terms in order in a shingle subfield.

For example I am indexing below doc

Create index

PUT products
{
  "mappings": {
    "properties": {
      "description": {
        "type": "search_as_you_type"
      }
    }
  }
}

Index Docs

POST products/_doc/
{
  "description": "best jogging shoes for men"
}
POST products/_doc/
{
  "description": "I purchased best sport shoes for upcoming match"
}

Query 1

GET products/_search
{
  "query": {
    "multi_match": {
      "query": "best sport jogging",
      "type": "bool_prefix", 
      "fields": [
        "description"
        ]
    }
  }
}

Response

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.201328,
    "hits": [
      {
        "_index": "products",
        "_id": "i5HdtIsBDw1I7eA1_uoH",
        "_score": 1.201328,
        "_source": {
          "description": "best jogging shoes for men"
        }
      },
      {
        "_index": "products",
        "_id": "jJHktIsBDw1I7eA1supk",
        "_score": 0.79994905,
        "_source": {
          "description": "I purchased best sport shoes for upcoming match"
        }
      }
    ]
  }
}

If you have noticed, We still getting record but best match is coming on second rank. Because in this query only term jogging is getting matched but my expectation was doc which contains best sport should come first.

Lets try another query

Query 2

GET products/_search
{
  "query": {
    "multi_match": {
      "query": "best sport jogging",
      "type": "bool_prefix", 
      "fields": [
        "description",
        "description._2gram",
        "description._3gram"
        ]
    }
  }
}

Response

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.4235239,
    "hits": [
      {
        "_index": "products",
        "_id": "jJHktIsBDw1I7eA1supk",
        "_score": 1.4235239,
        "_source": {
          "description": "I purchased best sport shoes for upcoming match"
        }
      },
      {
        "_index": "products",
        "_id": "i5HdtIsBDw1I7eA1_uoH",
        "_score": 1.201328,
        "_source": {
          "description": "best jogging shoes for men"
        }
      }
    ]
  }
}

This time I am getting proper order because best sport term is matched with description._2gram field. Hence it boost the score.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.