How to increase the speed of response?

While better hardware will undoubtedly help, I fully endorse (based on personal experience) finding alternatives to the wildcard query for partial string/substring matching, which can quickly become slow and otherwise resource-intensive, especially when using leading wildcards.

Among options:

  1. If you need to use wildcard queries, consider using the wildcard type for those fields that you query (assuming you don't query the whole _source and that you're using Elasticsearch v7.9 or later); see Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog for a nice intro. It's important to note that you need to reindex your dataset for these changes to take effect.
  2. Use fuzzy queries; I have yet to use them myself, so I don't have any experience with them, but they can prove slow for large datasets, and adjusting the fuzziness attribute value could help balance accuracy and performance.
  3. Use match queries and custom analyzers to index fields as (edge-)N-grams; this is likely one of the most convenient solutions; you can use the analyze API (Test an analyzer | Elasticsearch Guide [8.15] | Elastic) to test your analyzers. You also need to reindex your dataset. Watch out for situations where your indexing results in an abundance of tokens due to the extensive character count (e.g., min_gram=2, max_gram=10); you can eventually mitigate by using n-gram with a low min_gram - max_gram range (e.g., 3 to 10 characters) or adding a search-as-you-type field.

In any case, you should thoroughly analyze the requirements to fine-tune the analyzers and queries—maybe not all fields need to be indexed, etc.

Additional resources:

Good luck!

If I have data like this
{
"gmail": "dsagent esdkh@gmail.com"
}

{
"gmail": "dsagent esdkh@gmail.org"
}
And I want to bring the email that contains in es and at the end .com

Is there a way other than wildcard ?

If this is, indeed, how your searched fields look like, a wildcard query might not necessarily be the problem (since the value sizes/lengths are rather small). I would still map gmail as a wildcard field (see Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog, Keyword type family | Elasticsearch Guide [8.15] | Elastic) and then query it with a regular wildcard query.

If you know exactly what you are seraching for with the wildcard and do this frequently, why not parse that out into a separate field when you index so you can instead use an efficient term query?

1 Like

I am looking for an email and a URL
Can you give me an example?

Yes, the data looks like this, but I'm not searching by wildcard
Example:

GET insex_name/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"email": {
"value": "esdkh"
}
}
},
{
"wildcard": {
"email": {
"value": "*.org" # or .com ......
}
}
}
]
}
}
}

Well, it seems you know what/where you are searching, so, as @Christian_Dahlqvist points out, the most sensible approach would be to parse/separate those email parts into separate fields, either via your front app (in case you have one) or by using logstash to ingest data. (In any case, if you're not using wildcards in a query input, then don't use the wildcard query at all, a match query will do.)

Alternatively (and for the sake of the example), you can (re)define your index mapping with settings/analyzers/token filters to customize the indexing of the email field (in what follows, I assume you want to search in the email local-part and the top-level domain):

{
  "settings": {
    "analysis": {
      "filter": {
        "email_parts_flt": {
          "type": "multiplexer",
          "preserve_original": false,
          "filters": ["email_local-part_flt, email_local-part_tokens_flt", "email_tld_flt"]
        },
        "email_local-part_flt": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": ["([^@]*)@"]
        },
        "email_local-part_tokens_flt": {
          "type": "word_delimiter_graph",
          "split_on_case_change": false,
          "split_on_numerics": false
        },
        "email_tld_flt": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": ["(\\.[-A-Za-z0-9]+)$"]
        }
      },
      "analyzer": {
        "email_analyzer": {
          "tokenizer": "keyword",
          "filter": ["email_parts_flt", "lowercase", "unique"]
        },
        "email_search_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer",
        "search_analyzer": "email_search_analyzer"
      }
    }
  }
}

(don't forget to re-index your existing data, if any).

Testing the email_analyzer like:

POST <my-index>/_analyze
{
  "analyzer": "email_analyzer",
  "text": "dsagent esdkh-XY.foo@gmail.Bar.com"
}

yields the following tokens that index the email value above:

{
  "tokens": [
    {
      "token": "dsagent",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 0
    },
    {
      "token": "esdkh",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 1
    },
    {
      "token": "xy",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 2
    },
    {
      "token": "foo",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 3
    },
    {
      "token": ".com",
      "start_offset": 0,
      "end_offset": 34,
      "type": "word",
      "position": 3
    }
  ]
}

That means if inputted, any of those tokens would match your document with a simple match query.

The solution avoids wildcard queries and moves the burden to the ingestion/indexing phase, but I feel it becomes a bit overstretched and not very easy to read and understand.

1 Like

Thank you very much
I'll try this.

I want to query about dsdkh and .com how to query without using wildcard ?

It didn't work.

I made this query but it did not bring any data

POST /index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"email": "smith"
}
},
{
"wildcard": {
"email": "smith.com"
}
}
]
}
}
}

Try with a query like:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "email": "smith"
          }
        },
        {
          "match": {
            "email": ".com"
          }
        }
      ]
    }
  }
}

It might also improve the match if you removed the "email_search_analyzer" entry from the settings above. I.e., tell Elasticsearch to use the same analyzer on the input.

Having said that, I think you should read a bit about how Elasticsearch matches things (maybe here: Understanding and Working with Match Queries | by Madhusudhan Konda | Medium, etc.).

ok, thank you