ElasticSearch: DSL Query

SalvoDM91 · September 20, 2022, 9:58am

Hi guys, I am opening this topic because I have a problem with a large amount of data (14M).
My dataset is composed as follows:

{"h":{"id":"AA001","process":"AK01","update-timestamp":1663665372171}}
{"h":{"id":"AA002","process":"AK01","update-timestamp":1663665372171}}
{"h":{"id":"AA003","process":"AK01","update-timestamp":1663665372171}}
{"h":{"id":"AA004","process":"AK01","update-timestamp":1663665372171}}
{"h":{"id":"AA001","process":"AK01","update-timestamp":1663665372172}}
{"h":{"id":"AA001","process":"AK01","update-timestamp":1663665372173}}

If my pipeline worked correctly, for each key (id and process) I should only have the most up-to-date update-timestamp.
So I'm trying to count the duplicate values: knowing how many update-timestamps are associated with the same id - process.

I tried to do it with kibana, with a datatable but the volumes are too high and it goes in error.

Could you help me with dsl?

Thanks in advance!
Salvo

system · October 18, 2022, 9:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dedup Help! Elasticsearch	2	1031	September 25, 2019
Problem in filtering data based on individual fields Elasticsearch	5	345	July 6, 2017
How do I do a filtered query using Elasticsearch DSL (python) Elasticsearch	1	2436	July 5, 2017
Looking for guidance on how to create a DSL query Elasticsearch	3	96	July 10, 2024
Unique feilds count Elasticsearch elastic-stack-sql	10	2309	February 10, 2022

ElasticSearch: DSL Query

Related topics