Remove duplicate recods and get the smallest record id


(Venkat Venkat) #1

Assuming I have data like:

rowid_xfef,city,postal,address,name
1,c1,p1,a1,n1
2,c1,p1,a1,n1
3,c2,p2,a2,n2
4,c3,p3,a3,n3
5,c3,p3,a3,n3
6,c3,p3,a3,n3

I should match city,postal,address,and name and the record with lowest rowid should be printed.

Output should be:
1,c1,p1,a1,n1
3,c2,p2,a2,n2
4,c3,p3,a3,n3

Please do the needful.


(Loren Siebert) #2

Check out this info on Multi-field terms aggregation.

If you happen to be using Logstash to ingest the data, consider using a fingerprint field, which could then be aggregated much more quickly to determine where the duplicates are. Or you can use the fingerprint as the ID so you just have one copy of each unique city,postal,address,name tuple.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.