Get number of unique results from multiple indices

I have 5 indices: all of them with the same schema, so each one of them represents a dataset among many datasets. Each one of them is around 1Tb of size. I want to issue a query to all of the indices at the same time and then count the number of the results removing the duplicate document ids (there will be many in between the indices). I am interested only in documents ids, just to determine the number of unique results, so that to show that to the user. How could I do that in an efficient manner? Is it possible at all with Elasticsearch? I can't just get all of the results for obvious reasons. A query can return all of the documents in an index.

Sounds like a terms agg might do what you want?

1 Like

I found the following solution:

    {
      "aggs": {
        "variants_count": {
          "cardinality": { "field": "variantId" } 
        }
      }
    }

With cardinality and it gives what I want. However, I have an issue: there are 4 identical (only indices are different) paged searches that are constructed using MultiSearch and I am not sure whether it is possible to do something like that in elasticsearch-dsl:

ms.aggs.bucket('variants_count', 'cardinality', field='variantId')

where ms is of type MultiSearch, but run over all pages and indices.

What should be the approach? How could achieve that?

Seems like I am looking for something like merging of MultiSearch into a single query.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.