Field analyzer for matching concepts and names

Dear Elasticsearch community, dear Elastic team,

I have a question about field analyzers that I would like to discuss here.
In our index we have fields (field cities in the below example) that store known concepts or names that are constructed of single or multiple terms. We would like to do queries with the default operator AND so all query terms have to match in at least one of the search fields.

Given the following document

{
  "title": "Nice Cities",
  "cities": ["Basel", "New York"]
}

I would like to have the following query behavior:

Query Doc matches
nice basel :white_check_mark:
nice new york :white_check_mark:
nice basel new york :white_check_mark:
nice cities :white_check_mark:
new york :white_check_mark:
cities :white_check_mark:
nice york :x:
york :x:

For the queries I tried the following two types

GET discuss_elastic/_search
{
  "query": {
    "multi_match": {
      "query": "nice new york",
      "fields": ["title", "cities"],
      "operator": "AND",
      "type": "cross_fields"
    }
  }
}


GET discuss_elastic/_search
{
  "query": {
    "simple_query_string": {
      "query": "nice new york",
      "fields": ["title", "cities"],
      "default_operator": "AND",
      "flags": "WHITESPACE"
    }
  }
}

I already experimented with the following index design that is producing shingles for the cities field but unfortunately it does not work out.

PUT discuss_elastic
{
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 4
        }
      },
      "analyzer": {
        "cities_query_analyzer": {
          "tokenizer": "ws_dot_tokenizer",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        },
        "cities_index_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "ws_dot_tokenizer": {
          "type": "char_group",
          "tokenize_on_chars": [
            "whitespace",
            "."
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "cities": {
        "type": "text",
        "analyzer": "cities_index_analyzer",
        "search_analyzer": "cities_query_analyzer"
      }
    }
  }
}

PUT discuss_elastic/_doc/1
{
  "title": "Nice Cities",
  "cities": ["Basel", "New York"]
}

I tried to use the cities_query_analyzer for both indexing and query but this then behaves like a normal text fields and therefore also matches the query "nice york".

1 Like

In my initial post, I had the cities_query_analyzer for both indexing and querying which was a c+p failure.

Anyone an idea how to archive the desired behavior?

The solution I tried was inspired by the book "Relevant Search" by @softwaredoug and @JnBrymn. Maybe you guys can help here :slight_smile:

nobody an idea?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.