Difference of Sets in ES

Jader_Gonzalez · July 11, 2024, 2:13am

I have experience working with SQL and am familiar with performing set differences using SQL. Although I understand that Elasticsearch (ES) operates differently, I believe there must be an efficient method for achieving a set difference of two indices in ES. To clarify my question, let's consider the scenario where index A contains all elements within a certain universe, and index B consists of only specific elements from A. Essentially, every element x in B also exists in A.

Given this, I am seeking a way to calculate the difference between A and B, meaning identifying elements in A that do not exist in B. Despite my efforts to find a solution, it seems challenging to achieve this in ES. This is concerning as excluding elements is a common operation in my use case, where A contains millions of records while B is a much smaller subset of A. It is crucial for me to accurately count the number of excluded elements in my logic. Appreciate any kind of help that might be provided.

Justin_Castilla · July 11, 2024, 6:51pm

Hello Jader_Gonzalez! Welcome to the forum!

Just off the top of my head without any real testing or validation, would you be able to add a boolean to an element in index A when it also exists in index B? I'm thinking you could filter for elements in index A where the boolean field you create does NOT exist.

Here's an example I found in a previous post that might illustrate what I'm thinking:

DELETE test

PUT test/_doc/1
{
  "key1" : "value1",
  "exists_in_B": true
}

PUT test/_doc/2?refresh
{
  "key2" : "value2"
}

GET test/_search 
{
  "query": {
    "bool": {
      "must_not": [
        { "exists": { "field": "exists_in_B"}}
      ]
    }
  }
}

This would be assuming both are made at the same time or this boolean field can be added and updated in a timely and efficient manner. While Set notation isn't a native feature to Elasticsearch, it would be a good addition for some use cases. Hopefully this method helps? Would love to help out more if this isn't something you'd be able to do in your case.

Topic		Replies	Views
ElasticSearch and lookups/set difference Elasticsearch	3	1299	December 12, 2016
Querying differences between indexes Elasticsearch	2	385	April 17, 2018
Exists vs missing performance Elasticsearch	6	3451	July 5, 2017
Search two indexes and update Elasticsearch	2	560	July 5, 2017
Diff query between two documents Elasticsearch	3	2459	December 16, 2018

Difference of Sets in ES

Related topics