Dear Elasticsearch community, dear Elastic team,
I have a question about field analyzers that I would like to discuss here.
In our index we have fields (field cities in the below example) that store known concepts or names that are constructed of single or multiple terms. We would like to do queries with the default operator AND so all query terms have to match in at least one of the search fields.
Given the following document
{
"title": "Nice Cities",
"cities": ["Basel", "New York"]
}
I would like to have the following query behavior:
Query | Doc matches |
---|---|
nice basel | |
nice new york | |
nice basel new york | |
nice cities | |
new york | |
cities | |
nice york | |
york |
For the queries I tried the following two types
GET discuss_elastic/_search
{
"query": {
"multi_match": {
"query": "nice new york",
"fields": ["title", "cities"],
"operator": "AND",
"type": "cross_fields"
}
}
}
GET discuss_elastic/_search
{
"query": {
"simple_query_string": {
"query": "nice new york",
"fields": ["title", "cities"],
"default_operator": "AND",
"flags": "WHITESPACE"
}
}
}
I already experimented with the following index design that is producing shingles for the cities field but unfortunately it does not work out.
PUT discuss_elastic
{
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
},
"analyzer": {
"cities_query_analyzer": {
"tokenizer": "ws_dot_tokenizer",
"filter": [
"lowercase",
"shingle_filter"
]
},
"cities_index_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"ws_dot_tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"whitespace",
"."
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"cities": {
"type": "text",
"analyzer": "cities_index_analyzer",
"search_analyzer": "cities_query_analyzer"
}
}
}
}
PUT discuss_elastic/_doc/1
{
"title": "Nice Cities",
"cities": ["Basel", "New York"]
}
I tried to use the cities_query_analyzer for both indexing and query but this then behaves like a normal text fields and therefore also matches the query "nice york".