Elasticsearch get all unique Ids

iamtheschmitzer · December 21, 2021, 5:45pm

Thanks to all who read this for your help.

Background: I have a job that when executing inserts a document. The job can fail and run again, inserting a second document. Both have the same linkId- so I know there is only one job. I need the count of ALL unique linkIds - no matter how many there are.

I first looked at cardinality but that has a limit of 40,000 (precision_threshold) after which it becomes inaccurate.

Then I looked into cumulative_cardinality but that strategy uses cardinality. But the documentation makes no mention of inaccuracy. Is this method accurate for large numbers?

My aggs section looks like this:

  "aggs": {
    "unique_attempts": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "distinct_attempts": {
          "cardinality": {
            "field": "linkId"
          }
        },
        "total_attempts": {
          "cumulative_cardinality": {
            "buckets_path": "distinct_attempts"
          }
        }
      }
    }
  }

So is this an approach that will give me an accurate count of unique localId? Or do I make multiple calls using the API and join the results myself? If API - what strategy would be best?

Many thanks this is not an easy question

Christian_Dahlqvist · December 21, 2021, 7:06pm

Have you considered using a transform to create a separate index with a single document per localId? This would allow you to query, filter and count the number of documents, which would give exact values. The trade-off is that it may lag a little.

stephenb · December 21, 2021, 7:11pm

@iamtheschmitzer

@Christian_Dahlqvist provided and excellent advice.

The Latest transform could be a great solution for this...added benefit the last document details are also saved

iamtheschmitzer · December 21, 2021, 7:33pm

No, first I've heard of it. I'll take a look. Thanks!

warkolm · December 21, 2021, 8:25pm

It's Elasticsearch, not ElasticSearch by the way

iamtheschmitzer · December 21, 2021, 9:08pm

system · January 18, 2022, 9:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get number of unique results from multiple indices Elasticsearch	3	581	May 31, 2021
Running cardinality for more than 10000 buckets Elasticsearch	14	2867	August 28, 2019
Beginner - How to aggregate count on large cardinality? Elasticsearch	5	413	April 1, 2019
Get number of unique values in a field Elasticsearch	3	1026	July 6, 2017
Need help with aggregation and unique counted values Elasticsearch	2	566	July 6, 2017

Elasticsearch get all unique Ids

Related topics