How to increase the speed of response?

octavian-nita · September 11, 2024, 9:21am

While better hardware will undoubtedly help, I fully endorse (based on personal experience) finding alternatives to the wildcard query for partial string/substring matching, which can quickly become slow and otherwise resource-intensive, especially when using leading wildcards.

Among options:

If you need to use wildcard queries, consider using the wildcard type for those fields that you query (assuming you don't query the whole _source and that you're using Elasticsearch v7.9 or later); see Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog for a nice intro. It's important to note that you need to reindex your dataset for these changes to take effect.
Use fuzzy queries; I have yet to use them myself, so I don't have any experience with them, but they can prove slow for large datasets, and adjusting the fuzziness attribute value could help balance accuracy and performance.
Use match queries and custom analyzers to index fields as (edge-)N-grams; this is likely one of the most convenient solutions; you can use the analyze API (Test an analyzer | Elasticsearch Guide [8.15] | Elastic) to test your analyzers. You also need to reindex your dataset. Watch out for situations where your indexing results in an abundance of tokens due to the extensive character count (e.g., min_gram=2, max_gram=10); you can eventually mitigate by using n-gram with a low min_gram - max_gram range (e.g., 3 to 10 characters) or adding a search-as-you-type field.

In any case, you should thoroughly analyze the requirements to fine-tune the analyzers and queries—maybe not all fields need to be indexed, etc.

Additional resources:

Good luck!

dsagent · September 11, 2024, 3:30pm

If I have data like this
{
"gmail": "dsagent esdkh@gmail.com"
}

{
"gmail": "dsagent esdkh@gmail.org"
}
And I want to bring the email that contains in es and at the end .com

Is there a way other than wildcard ?

octavian-nita · September 12, 2024, 7:26am

If this is, indeed, how your searched fields look like, a wildcard query might not necessarily be the problem (since the value sizes/lengths are rather small). I would still map gmail as a wildcard field (see Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog, Keyword type family | Elasticsearch Guide [8.15] | Elastic) and then query it with a regular wildcard query.

Christian_Dahlqvist · September 12, 2024, 7:30am

If you know exactly what you are seraching for with the wildcard and do this frequently, why not parse that out into a separate field when you index so you can instead use an efficient term query?

dsagent · September 12, 2024, 8:07am

I am looking for an email and a URL
Can you give me an example?

dsagent · September 12, 2024, 8:17am

Yes, the data looks like this, but I'm not searching by wildcard
Example:

GET insex_name/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"email": {
"value": "esdkh"
}
}
},
{
"wildcard": {
"email": {
"value": "*.org" # or .com ......
}
}
}
]
}
}
}

octavian-nita · September 13, 2024, 9:38am

Well, it seems you know what/where you are searching, so, as @Christian_Dahlqvist points out, the most sensible approach would be to parse/separate those email parts into separate fields, either via your front app (in case you have one) or by using logstash to ingest data. (In any case, if you're not using wildcards in a query input, then don't use the wildcard query at all, a match query will do.)

Alternatively (and for the sake of the example), you can (re)define your index mapping with settings/analyzers/token filters to customize the indexing of the email field (in what follows, I assume you want to search in the email local-part and the top-level domain):

{
  "settings": {
    "analysis": {
      "filter": {
        "email_parts_flt": {
          "type": "multiplexer",
          "preserve_original": false,
          "filters": ["email_local-part_flt, email_local-part_tokens_flt", "email_tld_flt"]
        },
        "email_local-part_flt": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": ["([^@]*)@"]
        },
        "email_local-part_tokens_flt": {
          "type": "word_delimiter_graph",
          "split_on_case_change": false,
          "split_on_numerics": false
        },
        "email_tld_flt": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": ["(\\.[-A-Za-z0-9]+)$"]
        }
      },
      "analyzer": {
        "email_analyzer": {
          "tokenizer": "keyword",
          "filter": ["email_parts_flt", "lowercase", "unique"]
        },
        "email_search_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer",
        "search_analyzer": "email_search_analyzer"
      }
    }
  }
}

(don't forget to re-index your existing data, if any).

Testing the email_analyzer like:

POST <my-index>/_analyze
{
  "analyzer": "email_analyzer",
  "text": "dsagent esdkh-XY.foo@gmail.Bar.com"
}

yields the following tokens that index the email value above:

{
  "tokens": [
    {
      "token": "dsagent",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 0
    },
    {
      "token": "esdkh",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 1
    },
    {
      "token": "xy",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 2
    },
    {
      "token": "foo",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 3
    },
    {
      "token": ".com",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 3
    }
  ]
}

That means if inputted, any of those tokens would match your document with a simple match query.

The solution avoids wildcard queries and moves the burden to the ingestion/indexing phase, but I feel it becomes a bit overstretched and not very easy to read and understand.

dsagent · September 14, 2024, 6:25am

Thank you very much
I'll try this.

dsagent · September 14, 2024, 7:37am

I want to query about dsdkh and .com how to query without using wildcard ?

dsagent · September 14, 2024, 3:58pm

It didn't work.

I made this query but it did not bring any data

POST /index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"email": "smith"
}
},
{
"wildcard": {
"email": "smith.com"
}
}
]
}
}
}

octavian-nita · September 23, 2024, 7:57am

Try with a query like:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "email": "smith"
          }
        },
        {
          "match": {
            "email": ".com"
          }
        }
      ]
    }
  }
}

It might also improve the match if you removed the "email_search_analyzer" entry from the settings above. I.e., tell Elasticsearch to use the same analyzer on the input.

Having said that, I think you should read a bit about how Elasticsearch matches things (maybe here: Understanding and Working with Match Queries | by Madhusudhan Konda | Medium, etc.).

dsagent · September 24, 2024, 7:25am

ok, thank you

Topic		Replies	Views
How to increase query speed on search engine? Elasticsearch	4	315	February 23, 2022
Query response time not scaling well with large resultsets Elasticsearch	8	401	July 6, 2017
Performance issue with Elastic Elasticsearch	11	1092	October 18, 2017
Slow index speed for larger amounts of data Elasticsearch	1	364	July 6, 2017
Slow query Elasticsearch	3	285	July 6, 2017

How to increase the speed of response?

Related topics