Slow Query Performance

hmacarthur · October 18, 2024, 7:25pm

Hi. I've been noticing slower than desired query performance on my Elasticsearch cluster, and am hoping for guidance. My queries use exact matching on all terms, except for a single wildcard term match. I've attached a specific query below. This entire query took ~700ms. I know that the double wildcard is expensive as Ive tested using a single one, and the latency dropped significantly. Im mainly interested in the following:

How does Elasticsearch plan / execute this query? Does it first execute the accountId match (since its the lowest cost) and then execute the wildcard query? In this query, would the merchantName match only get applied to documents that are first verified to have a matching accountId?

What are my options at improving performance? I've read that N-gram analyzers are more performant than wildcards. I could also use a wildcard term type.

For reference, I've been reading this document for insight.

Thanks!

{
    "from": 0,
    "size": 41,
    "query": {
        "bool": {
            "must": [
                {
                    "constant_score": {
                        "filter": {
                            "terms": {
                                "accountId": [
                                    "....."
                                ],
                                "boost": 1
                            }
                        },
                        "boost": 1
                    }
                },
                {
                    "wildcard": {
                        "merchantName": {
                            "wildcard": "*E*",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "_source": {
        "includes": [
            "id"
        ],
        "excludes": []
    },
    "sort": [
        {
            "transactionTime": {
                "order": "desc"
            }
        }
    ]
}

Justin_Castilla · October 18, 2024, 8:02pm

Hello Henry,
I would definitely recommend trying out the Ngram path. Just yesterday a community member gave a talk about creating a customized search filter using Ngram tokens, specifically two different ones chained together.

This is pretty tightly coupled to the email string format, but it should offer some possible actions to try:

Mark_Harwood1 · October 21, 2024, 7:14am

The wildcard field uses ngrams under the covers.
It parses wildcard and regex queries into suitable ngrams and constructs a rough query out of them for fast disk lookup then double checks any candidate matches to see if the doc’s value actually fully matches the original query pattern.

So it does a lot of the hard work for you.

Topic		Replies	Views
Performance of filtered wildcard queries Elasticsearch	2	2705	June 29, 2018
Filtered wildcard query Elasticsearch	6	11189	September 5, 2018
Relevation on wildcard results and wildcard speed Elasticsearch	6	413	July 6, 2017
Slow wildcard query with fast term filters Elasticsearch	2	780	June 21, 2018
Wildcard queries slow since ES 5.x Elasticsearch	2	993	December 27, 2022

Slow Query Performance

Related topics