How to get all unique values of a field for a single index?

Tom_Somerville · January 17, 2020, 4:12pm

Hey All,
Running into an issue that I cannot seem to find a solution. I currently have a number of indexes, all of them billions of docs large, and I need a way to extract all unique values for any given field in the index. I would say the majority of the data is duplicated data in the field, I would expect about 50m-100m unique results for the 1bil+ docs.

When I do a terms aggregation Im limited to the 10000 buckets, and cannot get more than 10k unique values and I cannot seem to figure out how to paginate the aggs buckets (I can scroll the hits, but this isnt of value to me as it is just giving me every doc, 10k at a time, until I scroll through all 1bil+ docs)

Also, I cant seem to get a composit search to work. Again, the ['hits']['hits'] are not unique values, and when i look at the buckets, i am being returned values that do not exist that start with hypens, and illegal characters for the field type ( like * and : )

Can anyone explain to me how I can grab all 50m unique values of an index. Currently the only way I can actually get this to work is to scroll through every doc, extract the value, and dedupe. But with 2-3 second queries at 10k at a time, this will take over 3 days!

I put my queries below, also have tried playing around with doc and bucket size parameters with no luck.

what I have tried:

Attempt 1: Scrolling terms agg

GET /dns-2020.01.13/_search?scroll=1m
{
"aggs": {
"unique_quieries": {
"terms": {
"field": "query.keyword",
"size": 10000
}
}
}
}

followed up with
GET /_search/scroll
{
"scroll" : "5m",
"scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBAAAAAAAEVZiFkx1REFCM0FXU3dpNGpFOEZXNnZaRGcAAAAAABB4zRY1NHdiS2dzc1JaZXhoSjNTaFZvbmVRAAAAAAAP690WOEZWeFd5cU1Td3VIVWFvbEo0MXh0ZwAAAAAAPubpFkl5RDl6b1d2U1JxMmMzejQ3V05odlE="
}

Attempt 2: Composit search, returns odd values and non unique hits.

GET /dns-2020.01.13/_search
{
"track_total_hits": false,
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "query": { "terms" : { "field": "query.keyword" } } }
]
}
}
}
}

system · February 14, 2020, 4:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need all distinct values .. Elasticsearch returns 1000 only Elasticsearch	1	438	January 21, 2020
DISTINCT values DSL query Elasticsearch	3	10850	May 18, 2022
Get all unique values of a field across documents Elasticsearch	7	3407	November 15, 2019
Generate Aggregation List for Large Index Elasticsearch	4	496	January 25, 2017
Unique values on the matching docs Elasticsearch	1	341	July 6, 2017

How to get all unique values of a field for a single index?

Related topics