Difference of Sets in ES

I have experience working with SQL and am familiar with performing set differences using SQL. Although I understand that Elasticsearch (ES) operates differently, I believe there must be an efficient method for achieving a set difference of two indices in ES. To clarify my question, let's consider the scenario where index A contains all elements within a certain universe, and index B consists of only specific elements from A. Essentially, every element x in B also exists in A.

Given this, I am seeking a way to calculate the difference between A and B, meaning identifying elements in A that do not exist in B. Despite my efforts to find a solution, it seems challenging to achieve this in ES. This is concerning as excluding elements is a common operation in my use case, where A contains millions of records while B is a much smaller subset of A. It is crucial for me to accurately count the number of excluded elements in my logic. Appreciate any kind of help that might be provided.

Hello Jader_Gonzalez! Welcome to the forum!

Just off the top of my head without any real testing or validation, would you be able to add a boolean to an element in index A when it also exists in index B? I'm thinking you could filter for elements in index A where the boolean field you create does NOT exist.

Here's an example I found in a previous post that might illustrate what I'm thinking:

DELETE test

PUT test/_doc/1
{
  "key1" : "value1",
  "exists_in_B": true
}

PUT test/_doc/2?refresh
{
  "key2" : "value2"
}

GET test/_search 
{
  "query": {
    "bool": {
      "must_not": [
        { "exists": { "field": "exists_in_B"}}
      ]
    }
  }
}

This would be assuming both are made at the same time or this boolean field can be added and updated in a timely and efficient manner. While Set notation isn't a native feature to Elasticsearch, it would be a good addition for some use cases. Hopefully this method helps? Would love to help out more if this isn't something you'd be able to do in your case.