List of string must match all provided values

Nicolas_Rey · July 8, 2020, 1:12pm

Hi,

I'm new to Elastic Search, and I simply would like to make a query that filter all entries that have exactly all the supplied values (not an OR but an AND).

Here is an example of my document

[
    {
    "Areas" : [
    "Area A",
    "Area B",
    "Area C",
    "Area D"
    ],
    "UserId" : "User 1"
    },
    {
    "Areas" : [
    "Area A",
    "Area C",
    ],
    "UserId" : "User 2"
    }
]

Let's say I want to filter all entries where Areas contains all the values "Area A" and "Area B"

I tried with the terms filter but it provides an OR filering, so here both my User 1 and User 2 are selected where I would like only the User 1

      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "areas.keyword": [
                  "Area A",
                  "Area B"
                ]
              }
            }
          ]
    	 }
    	}

Here is the field mapping

        "areas": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }

Could you help me please ?
Thank you

Nicolas_Rey · July 8, 2020, 1:38pm

I tried this

  "query": {
    "bool": {
      "must": [
		{
          "match_phrase": {
            "areas": "Area A"
          }
        },
          {
          "match_phrase": {
            "areas": "Area B"
          }
        }
      ]
	 }
	}

it seems to work, but is it correct or does it exist a better solution ?

Vinayak_Sapre · July 8, 2020, 6:51pm

@Nicolas_Rey
Can you try this

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "Areas.keyword": "Area A"
          }
        },
        {
          "term": {
            "Areas.keyword": "Area B"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}

Also note your mapping has field name areas lower case A. But document has field with upper case A.

Christian_Dahlqvist · July 8, 2020, 7:00pm

Have you looked at the terms set query?

Nicolas_Rey · July 9, 2020, 6:56am

@Vinayak_Sapre

Hi, thanks it seems to work correctly, is there any difference using your solution (term) vs mine (match_phrase) ?

Nicolas_Rey · July 9, 2020, 6:59am

@Vinayak_Sapre

comparing the score
match_phrase : max_score" : 4.4969654
term: "max_score" : 8.961212

I don't know if it's significant ?

Vinayak_Sapre · July 9, 2020, 7:26am

@Nicolas_Rey

Phrase query is typically used if you want to match token "Area" followed by token "A" anywhere in a long analyzed string. So "Foo Area A Bar" will match too.

I think your requirement is "exact match" which is achieved by Term query. For exact match there is no need to compute score. So you sort on some field (_doc if no specific sorting required) or wrap it in a constant score query to avoid scoring overhead.

Best if you can profile queries using by adding "profile": true to see how breakdown looks.

Nicolas_Rey · July 9, 2020, 8:54am

@Vinayak_Sapre

Thank I understand now the difference between term and match_phrase query.

But when you say :

For exact match there is no need to compute score. So you sort on some field (_doc if no specific sorting required) or wrap it in a constant score query to avoid scoring overhead.

I don't understand what you mean here ? Indeed the "max_score" is a part of the output when I play the DSL query with Kibana. Is it not relevant at all in my case ?

Vinayak_Sapre · July 9, 2020, 12:37pm

@Nicolas_Rey
score is required if you want ElasticSearch to determine how well each document matches. ElasticSearch will sort documents based on score so most relevant documents show up at the top. Score is important if you care about

Matching part of your query for ex. document field has Area but not A.
What's in the field other than what you are searched for. "Foo Area A bar"
There are other scenarios too.

If you use your match_phrase query and search for "Area A" , "Area C", the user 2 document will show up on the top because it matches better compared to the user 1 document.

You can experiment running both queries on your data, use "explain": true to understand why one document is preferred over other.

Not sure why after adding sort, Kibana is showing score. It should be null or in case constant query it should be 1 for all matches.

Nicolas_Rey · July 10, 2020, 1:46pm

@Vinayak_Sapre

Very clear, thank you!

Vinayak_Sapre · July 11, 2020, 4:59am

@Nicolas_Rey
Sorry I was wrong about sort. If you do not want to compute score, you need put term queries in filter clause

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "Areas.keyword": "Area A"
          }
        },
        {
          "term": {
            "Areas.keyword": "Area B"
          }
        }
      ]
    }
  }
}

Nicolas_Rey · July 13, 2020, 3:39pm

@Vinayak_Sapre

If I do such (put term queries in filter clause instead of must clause), will I have the exact same outputs (ie same documents that will be filtered) ?

What is the benefit of not compute the score ?

Vinayak_Sapre · July 15, 2020, 6:22am

@Nicolas_Rey
Yes you will get same results.

If you don't need score why waste cpu cycles in computing it? For a single query is not going to save you a lot. But If this query is called lots of time time, it adds up.

Nicolas_Rey · July 15, 2020, 1:07pm

@Vinayak_Sapre

It makes sense, thank you again

system · August 12, 2020, 1:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Boolean Exact Query Elasticsearch	5	675	October 8, 2017
Exact word search Elasticsearch	3	296	May 21, 2020
Match on arrays Elasticsearch	3	413	July 6, 2017
Help with a query - problem matching documents Elasticsearch	2	374	August 9, 2019
Elastic Search Query to match Only if Specific value is present Elasticsearch	3	1103	June 25, 2022

List of string must match all provided values

Related topics