Create histogram for size of each unique value in an index

Andrew_McFague · November 12, 2015, 7:57pm

My cluster has around 3.5 billion documents, and each document has a list of "foreign IDs" that are relevant to the health of the system its tracking. However, these foreign IDs are constantly changing, so I'd like to setup a sort of histogram, over time, of the indexes, so that other systems can detect when there is a substantial increase/decrease of the number of documents matched by a given foreign ID.

For example,

DocA: {"foreignIDs": [1, 2, 3]}
DocB: {"foreignIDs": [2, 3]}
DocC: {"foreignIDs": [1,4]}

I'd like to regular poll and get the number of documents, such as:

{1: 2, 2: 2, 3: 2, 4: 1}

Currently, I am able to use the Terms Aggregate to get a complete list, but this seems to be a very expensive operation and can overwhelm the cluster causing updates to timeout. It also buckets the data, which can allegedly result in inaccurate counts. It's also a MASSIVE amount of data that is returned.

Is there any recommended way to get this information out of Elasticsearch without negatively impacting the cluster? Does Elasticsearch itself provide any means of statically maintaining this count? Or at least a way to optimize the lookups?

Thanks for all your help, and keep up the great work!

Andrew

(JDK8, Elasticsearch 2.0.0)

Topic		Replies	Views
Number of distinct values for a given field in a query? Elasticsearch	7	5298	July 6, 2017
Limiting buckets on histogram agg Elasticsearch	5	925	July 6, 2017
Analytics system hits & visits Elasticsearch	4	366	July 6, 2017
Need help with aggregation and unique counted values Elasticsearch	2	585	July 6, 2017
Get count unique values from field Elasticsearch	6	3050	July 6, 2017

Create histogram for size of each unique value in an index

Related topics