Field analyzer for matching concepts and names

jporzelt · November 6, 2020, 11:42am

Dear Elasticsearch community, dear Elastic team,

I have a question about field analyzers that I would like to discuss here.
In our index we have fields (field cities in the below example) that store known concepts or names that are constructed of single or multiple terms. We would like to do queries with the default operator AND so all query terms have to match in at least one of the search fields.

Given the following document

{
  "title": "Nice Cities",
  "cities": ["Basel", "New York"]
}

I would like to have the following query behavior:

Query	Doc matches
nice basel
nice new york
nice basel new york
nice cities
new york
cities
nice york
york

For the queries I tried the following two types

GET discuss_elastic/_search
{
  "query": {
    "multi_match": {
      "query": "nice new york",
      "fields": ["title", "cities"],
      "operator": "AND",
      "type": "cross_fields"
    }
  }
}


GET discuss_elastic/_search
{
  "query": {
    "simple_query_string": {
      "query": "nice new york",
      "fields": ["title", "cities"],
      "default_operator": "AND",
      "flags": "WHITESPACE"
    }
  }
}

I already experimented with the following index design that is producing shingles for the cities field but unfortunately it does not work out.

PUT discuss_elastic
{
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 4
        }
      },
      "analyzer": {
        "cities_query_analyzer": {
          "tokenizer": "ws_dot_tokenizer",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        },
        "cities_index_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "ws_dot_tokenizer": {
          "type": "char_group",
          "tokenize_on_chars": [
            "whitespace",
            "."
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "cities": {
        "type": "text",
        "analyzer": "cities_index_analyzer",
        "search_analyzer": "cities_query_analyzer"
      }
    }
  }
}

PUT discuss_elastic/_doc/1
{
  "title": "Nice Cities",
  "cities": ["Basel", "New York"]
}

I tried to use the cities_query_analyzer for both indexing and query but this then behaves like a normal text fields and therefore also matches the query "nice york".

jporzelt · November 7, 2020, 9:24am

In my initial post, I had the cities_query_analyzer for both indexing and querying which was a c+p failure.

Anyone an idea how to archive the desired behavior?

jporzelt · November 10, 2020, 9:14am

The solution I tried was inspired by the book "Relevant Search" by @softwaredoug and @JnBrymn. Maybe you guys can help here

jporzelt · November 17, 2020, 2:38pm

nobody an idea?

system · December 15, 2020, 2:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search multiple fields with “and” operator (but use fields' own analyzers) Elasticsearch	7	2457	July 6, 2017
Need suggestions on type of query to be used for a given analysis for better results? Elasticsearch	2	377	July 6, 2017
Multi_match query spanning multiple analyzer fields Elasticsearch	2	604	July 6, 2017
Multi match query with custom analyzer and 'and' operator Elasticsearch	3	2783	October 25, 2019
Custom analyzer on match_phrase Elasticsearch	6	1650	April 23, 2018

Field analyzer for matching concepts and names

Related topics