Elasticsearch must_not on all elements in array

Hi,

I have an object
{
"id": "first",
"field": [
{
"subfield": "0"
},
{
"subfield": "1"
},
{
"subfield": "2"
}
]
}

I want to search field.subfield =1 and also field.subfield!=1
Where both cases should return this item.

For equal it's easy. I can use QueryBuilders.boolQuery().filter(QueryBuilders.termsQuery("field.subfield", 1));

But not equal is problematic as this
QueryBuilders.boolQuery().mustNot(QueryBuilders.termsQuery("field.subfield", 1));
does not return this item as one item in array is 1.

I'm looking for someway to filter item only if ALL items in array equal 1.
So mixed should still be returned.

Any idea how to achieve this?

Anyone?

Someone else may have a better idea, but there is one way that I can think of. It requires you to map the field field as type nested.

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties" : {
        "field": {
          "type": "nested"
        }
      }
    }
  }
}

If you do so, you can use a nested query to query the subfield fields the way you want. To find all documents with at least one subfield equal to 1:

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "field",
      "query": {
        "match": {
          "field.subfield": "1"
        }
      }
    }
  }
}

And to find all documents with at least one subfield not equal to 1:

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "field",
      "query": {
        "bool": {
          "must_not": [
            {
              "match": {
                "field.subfield": "1"
              }
            }
          ]
        }
      }
    }
  }
}

Another option is to have a scripted search query, like bellow.

Note

The subfield is mapped as a keyword, and the value to be compared with all items of the subfield array is sent as a parameter (valueSub).

doc["field.subfield.keyword"].values.length => how many different values the array has
doc["field.subfield.keyword"].value => the value o the array

GET my_index/_search
{
  "query": {
    "script": {
      "script": {
        "lang": "painless",
        "source": """
                int subLen = doc["field.subfield.keyword"].values.length;
                if(subLen == 1 && doc["field.subfield.keyword"].value == params.valueSubfield)
                 false;
                else
                 true;
        """,
        "params": {
          "valueSubfield": "1"
        }
      }
    }
  }
}

Hi,

thanks!

Abdon this would be ok, but I would need to reindex all the items (which are in millions of large items and takes days).

Tamara yours looks easier but how to scripted filters affect performance?
If it's a big performance hit it might still be better to reindex my items.

Unfortunately, nothing is free in life :slight_smile:

The reindex operation may take a while, but it is a one time thing and after that you can query the nested objects.

Script queries can be computationally expensive, and they may take a long time if you want to run those against a lot of documents (as Elasticsearch will basically have to execute the query against each of your documents every time you use this query).

Maybe you can test if the script query performance is acceptable on your dataset? If it works well for you, then that's great. Maybe a hybrid approach works for you: use a script query until such time that you will have to reindex anyway (for an upgrade or something like that)?

Yeah probably mixed approach will be best.
Will try script and we will see if we need to reindex.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.