List of string must match all provided values

Hi,

I'm new to Elastic Search, and I simply would like to make a query that filter all entries that have exactly all the supplied values (not an OR but an AND).

Here is an example of my document

[
    {
    "Areas" : [
    "Area A",
    "Area B",
    "Area C",
    "Area D"
    ],
    "UserId" : "User 1"
    },
    {
    "Areas" : [
    "Area A",
    "Area C",
    ],
    "UserId" : "User 2"
    }
]

Let's say I want to filter all entries where Areas contains all the values "Area A" and "Area B"

I tried with the terms filter but it provides an OR filering, so here both my User 1 and User 2 are selected where I would like only the User 1

      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "areas.keyword": [
                  "Area A",
                  "Area B"
                ]
              }
            }
          ]
    	 }
    	}

Here is the field mapping

        "areas": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }

Could you help me please ?
Thank you

I tried this

  "query": {
    "bool": {
      "must": [
		{
          "match_phrase": {
            "areas": "Area A"
          }
        },
          {
          "match_phrase": {
            "areas": "Area B"
          }
        }
      ]
	 }
	}

it seems to work, but is it correct or does it exist a better solution ?

@Nicolas_Rey
Can you try this

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "Areas.keyword": "Area A"
          }
        },
        {
          "term": {
            "Areas.keyword": "Area B"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}

Also note your mapping has field name areas lower case A. But document has field with upper case A.

1 Like

Have you looked at the terms set query?

1 Like

@Vinayak_Sapre

Hi, thanks it seems to work correctly, is there any difference using your solution (term) vs mine (match_phrase) ?

@Vinayak_Sapre

comparing the score
match_phrase : max_score" : 4.4969654
term: "max_score" : 8.961212

I don't know if it's significant ?

@Nicolas_Rey

Phrase query is typically used if you want to match token "Area" followed by token "A" anywhere in a long analyzed string. So "Foo Area A Bar" will match too.

I think your requirement is "exact match" which is achieved by Term query. For exact match there is no need to compute score. So you sort on some field (_doc if no specific sorting required) or wrap it in a constant score query to avoid scoring overhead.

Best if you can profile queries using by adding "profile": true to see how breakdown looks.

@Vinayak_Sapre

Thank I understand now the difference between term and match_phrase query.

But when you say :

For exact match there is no need to compute score. So you sort on some field (_doc if no specific sorting required) or wrap it in a constant score query to avoid scoring overhead.

I don't understand what you mean here ? Indeed the "max_score" is a part of the output when I play the DSL query with Kibana. Is it not relevant at all in my case ?

@Nicolas_Rey
score is required if you want ElasticSearch to determine how well each document matches. ElasticSearch will sort documents based on score so most relevant documents show up at the top. Score is important if you care about

  1. Matching part of your query for ex. document field has Area but not A.
  2. What's in the field other than what you are searched for. "Foo Area A bar"
    There are other scenarios too.

If you use your match_phrase query and search for "Area A" , "Area C", the user 2 document will show up on the top because it matches better compared to the user 1 document.

You can experiment running both queries on your data, use "explain": true to understand why one document is preferred over other.

Not sure why after adding sort, Kibana is showing score. It should be null or in case constant query it should be 1 for all matches.

1 Like

@Vinayak_Sapre

Very clear, thank you!

@Nicolas_Rey
Sorry I was wrong about sort. If you do not want to compute score, you need put term queries in filter clause

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "Areas.keyword": "Area A"
          }
        },
        {
          "term": {
            "Areas.keyword": "Area B"
          }
        }
      ]
    }
  }
}

@Vinayak_Sapre

If I do such (put term queries in filter clause instead of must clause), will I have the exact same outputs (ie same documents that will be filtered) ?

What is the benefit of not compute the score ?

@Nicolas_Rey
Yes you will get same results.

If you don't need score why waste cpu cycles in computing it? For a single query is not going to save you a lot. But If this query is called lots of time time, it adds up.

@Vinayak_Sapre

It makes sense, thank you again

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.