Understanding a match query on a search-as-you-type index

dlants · March 21, 2022, 2:48pm

I'm curious if there is any way to understand the lucene query that's generated for a particular Elasticsearch DSL query.

For example, if I create an index:

  await client.indices.create({
    index: 'activities',
    mappings: {
      dynamic: false,
      properties: {
        title: { type: 'search_as_you_type', analyzer: 'english' },
        description: { type: 'search_as_you_type', analyzer: 'english' },
        keywords: {
          type: 'search_as_you_type',
          analyzer: 'english',
          fields: {
            exact: {
              type: 'keyword'
            }
          }
        },
        searchable: { type: 'boolean' }
      }
    }
  });
]

and then query it:

{
        index: 'activities',
        query: {
          multi_match: {
            query,
            type: 'bool_prefix',
            minimum_should_match: "75%",
            fields: [
              'title^2',
              'title._2gram^2',
              'title._3gram^2',
              'description',
              'description._2gram',
              'description._3gram'
            ]
          }
        }
      }

There's a lot going on here. Fields are analyzed and tokenized. There's the "operator" and "min_should_match" on the bool query, and the last term is treated differently (targeting the _index_prefix I assume, though that's implicit in the query).

I know that I can use GET /activities/_explain/5ddbf9ae009cd90bcdeaadd7 to explain the scoring for any particular document returned, but I'm curious if there's a way to see the Lucene query that's generated from this DSL. Reading through MultiMatchQueryBuilder is proving rather convoluted, though that's the best source of info on this I've found.

Is there a description of the algorithm that converts the DSL into a lucene query? Or better yet, a tool like the explain endpoint that can show me the lucene query generated from DSL?

dlants · March 21, 2022, 3:04pm

Answering my own question - since ES support was very helpful in this.

The Profile API exposes a query description and query structure. So for the example above,

GET /activities/_search
{
  "profile": true,
  "query": {
    "multi_match": {
      "query": "turtle time tri",
      "type": "bool_prefix",
      "operator": "and",
      "fields": [
        "title^2",
        "title._2gram^2",
        "title._3gram^2",
        "description",
        "description._2gram",
        "description._3gram"
    ]}
  }
}

Returns the following structure:

BooleanQuery
  ConstantScore(description._index_prefix:turtl time tri)
  (ConstantScore(title._index_prefix:turtl time tri))^2.0
  (+title:turtl +title:time +ConstantScore(title._index_prefix:tri))^2.0
  (+description:turtl +description:time +ConstantScore(description._index_prefix:tri))
  (+description._2gram:turtl time +ConstantScore(description._index_prefix:time tri))
  (+title._2gram:turtl time +ConstantScore(title._index_prefix:time tri))^2.0

Conceptually, it seems like the following queries:

for each target field, try to match the full phrase on the _index_prefix field.
for each target field, try to match every term except for last against the single-term field, and match just the last term as a constant query against the _index_prefix field.
if there are more than 3 terms, match pairs against the _2gram field, and the remainder against the prefix.

I still wish for some more detail here. For example, I can't tell if the clauses in the BooleanQueries are "shoulds" or "musts", and if they are shoulds, what the min_should_match parameter is.

It's also surprising that in the last query, (+description._2gram:turtl time +ConstantScore(description._index_prefix:time tri)) the second word from the query "turtle" is used twice - as part of the 2gram and also as part of the _index_prefix.

system · April 18, 2022, 3:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lucene syntax for match all docs Elasticsearch	9	3250	July 6, 2017
Learning the Query DSL Elasticsearch	5	653	July 6, 2017
What index(actual lucene inverted index) does a query request hit? Elasticsearch	1	368	March 19, 2020
(Newbie) Differences between text and field/query_string, and matching words vs phrases Elasticsearch	6	696	July 6, 2017
Understanding search-as-you-type Fields Elasticsearch	3	1341	December 7, 2023

Understanding a match query on a search-as-you-type index

Related topics