Filtering for wildcard domains

Searching for ends of strings is expensive because it means we can't efficiently accelerate lookups using the index (which stores terms alphabetically based on the start of the string). We end up scanning all index entries.
A way to counter that is to store a version of the strings which is reversed. It's a little clunky but here's an example mapping/doc/query that should be more efficient:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "reverser": {
          "tokenizer": "keyword",
          "filter": [
            "reverse"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "domain": {
        "type": "keyword",
        "fields": {
          "reversed": {
            "type": "text",
            "analyzer": "reverser"
          }
        }
      }
    }
  }
}
POST test/_analyze
{
  "field": "domain.reversed",
  "text": ["www.foo.com"]
}
POST test/_doc/1
{
  "domain":"www.foo.com"
}
POST test/_search
{
  "query": {
    "match_phrase_prefix": {"domain.reversed":  "foo.com"}
  }
}

For multiple clauses you would use a bool query with multiple of the match_phrase_prefix clauses inside the should clause.

Another approach is to make use of the new wildcard field but, like most things this has trade-offs. These were discussed here where the original question was related to your exact same problem (searching for ends of domain names).

1 Like