DISTINCT values DSL query

adrianfusco · April 19, 2022, 12:18pm

Hello,

I've been reading a lot about this topic because I've seen it has been asked before but I can do it works yet.

I am trying to get unique values from an index.

I have something like this:

id | app_name       | url
1  | app_1          | https://subdomain.app_1.com
2 | app_1          | https://app_1.com
3  | app_2          | https://app_1.com
4  | app_3          | https://subdomain.app_3.com
5  | app_1          | https://app_3.com

I would like to receive just the distinct app_name:

app_1
app_2
app_3

The query I tried with aggs is:

GET app_index/_search
{
  "aggs": {
    "unique_apps": {
      "terms": {
        "field": "app_name",
      }
    }
  }
}

I also tried a kind of group by here:

GET app_index/_search
{
  "aggs": {
    "unique_apps": {
      "terms": {
        "field": "app_name.keyword"
      },
      "aggs": {
        "oneRecord": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

But I still receive all the apps.

Is there a way to receive unique values?
Maybe is there a possibility to check in logstash if some value exists in the database and avoid sending it again? Or maybe use the fingerprint plugin and generate an unique _id according to the value of the field? If I receive the same information in that field it could generate the same ID so it won't be saved again.

I also checked if there's any possibility to create unique fields in Elasticsearch but I see it's not possible.

Thank you very much for your help and time

aaron-nimocks · April 19, 2022, 11:11pm

This one is correct. I think you are seeing in the results all the hits which is the records returned from the query. If you scroll to the bottom of the return you should see the aggregation. Most of the time when doing aggregation you don't need the hits so you can remove them using the below and it will only return the aggs.

GET app_index/_search
{
  "size": 0,
  "aggs": {
    "unique_apps": {
      "terms": {
        "field": "app_name",
      }
    }
  }
}

adrianfusco · April 20, 2022, 3:05pm

Thank you for your answer @aaron-nimocks

In this case I see that there are some values in the buckets list inside the aggregations but unfortunately not the data.

At the end I created another index with unique values according to the string using the fingerprint plugin. I'm not very sure if it's the best option but at the end I need to extract a lot of information and it was taking a lot of time.

I'll share what I did, maybe can be helpful to someone else that want to do the something similar.

Is there a way to receive unique values?

I've used the fingerprint plugin in this case. I've generated an unique ID based on the string. e.g, if I receive the same app_name name it will generate always the same _id so it won't be repeated in Elasticsearch. I've added this config in the logstash.conf pipeline, specifically in the filter side:

fingerprint {
    source => ["app_name"]
    target => ["unique_id_by_app_name"]
    method => "SHA1"
  }

Then in the output:

    elasticsearch {
      hosts => "localhost:9200"
      index => "logstash_apps"
      document_id => "%{[unique_id_by_app_name]}"
    }

If I receive again the app_1 with the same or even different data I'll have the same ID because the hashing:

$ -> echo -n "app_1" | sha1sum | awk -F '  -' '{print $1}'
87dbad46d7c47f3714eb02ff70e18b94e4ee6523

system · May 18, 2022, 3:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Querying the distinct values from ES index Elasticsearch	10	551	January 30, 2019
How to get Unique Records Elasticsearch	16	3667	December 10, 2020
How to get all unique values of a field for a single index? Elasticsearch	1	1624	February 14, 2020
Need all distinct values .. Elasticsearch returns 1000 only Elasticsearch	1	438	January 21, 2020
How do I get all the distinct field values that satisfy a particular query Elasticsearch	2	695	July 6, 2017

DISTINCT values DSL query

Related topics