How to diff multiple arrays in ES?

I am using Elasticsearch 6.0.1 and I have the following 3 indexes.

index1
{
  'id': 1,
  'list': [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
}

index2
{
  'id': 1,
  'list': [ 1, 2, 3, 100, 101, 102, ... ]
}

index3
{
  'id': 1,
  'list': [ 4, 5, 6, 200, 201, 202, ... ]
}

objective:

I want to get an array of containing all the entries from index1's list that are not present in any of the union of index2's list and index3's list.

ex) with the above example, the result I'd like to have is [ 7, 8, 9 ]

how do I go about doing this in Elasticsearch?

Elasticsearch is a search and aggregation engine only. It does not have any capabilities for processing datasets like that.

You could try using es-hadoop and do such processing with Apache Spark, for instance (if your dataset is really large)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.