Remove Duplicate records

Hi Folks,

I'm using Kibana to extract records from Elastic Search.
I'm using this query:

GET processos/_search/?size=100&pretty=true
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"movimento.movimentoNacional.codigoNacional": "22"
}
},
{
"match": {
"movimento.movimentoNacional.complemento": "baixa definitiva"
}
},
{
"match": {
"movimento.movimentoLocal.descricao": "baixa definitiva"
}
}
],
"tie_breaker": 0
}
}
}

The example results of Query is located here
records

I 'd like to remove duplicates.
I tried to do using one query to do all the process
{
"size": 100,
"aggs": {
"duplicateCount": {
"terms": {
"field": "dadosBasicos.numero.keyword",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}

The Record must be distinct by those 3 fields
dadosBasicos.numero.keyword
dadosBasicos.numeroClasse.keyword
siglaTribunal

Can you help with that?

Thanks in advance,

Best Regards,

--DJ

What data do you need from these documents? AFAIK, there's no way to "dedupe" in a query like that, and even if there is, I'm not sure how you'd indicate which of the duplicate records you'd want (and your choices are likely "first" or "last" based on some field, probably timestamp).

It sounds like you're just looking for unique fields based on dadosBasicos.numero.keyword, dadosBasicos.numeroClasse.keyword, and siglaTribunal, which you can do with three terms aggs based on those fields, but it would require you to be able to turn any other fields in that document into some metric. That is, you could see the total count of all docs, or some rolled up value of numeric fields (min, max, mean, etc). If you have other fields you need to see, and they are not numeric values, then a simple aggregation like that may not work. That's why I'm curious what information you need to see from these "unique" results.

Yes Joe,
I want the aggs by this three fields, how can I do this?
I 'd like to insert inside the same query that gets all the search results.
Can you give a clue how to do that ?
I saw just examples using aggs but with one field.
I tried this:
"aggs": {
"duplicateCount": {
"terms": {
"field": "dadosBasicos.numero.keyword",
"field": "dadosBasicos.numeroClasse.keyword",
"field": "siglaTribunal.keyword",
"field": "grau.keyword",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}

But I want unique resultset aggs by those fields.
How can I do that ?

Thank you very much for your attention,

Best regards,

--DJ

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.