Duplicate records in Elasticsearch

Hello Elastic Team,

We are creating our Agile dashboards using ELK Stack , we are able to extract data and push it to ES without any issues for the first time. but when we are pushing the incremental data , we are getting the duplicate records which is affecting our visualizations. Below is the one record that we are pushing it ES
{
"effort": 4,
"workitemcloseddate": null,
"workitemstatus": "Approved",
"workitemcreationdate": "2020-12-29T06:58:46.377Z",
"workitemcommitteddate": null,
"WorkitemType": "Product Backlog Item",
"valueArea": "Business",
"Stability": 0,
"Plannedeffort": 113,
"velocity": 0,
"workitemnum": "1269729",
"Cycletimestart": null,
"sprint": "CICD Sprint 23",
"sprintstartDate": "2021-01-04T00:00:00+00:00",
"sprintfinishDate": "2021-01-14T00:00:00+00:00",
"sprintstatus": "future",
"teamname": "Test Team",
"project": "ABCDemo",
}

workitemnum and WorkitemType are unique fields and can be used to get the record in ES index . if there is any change in any of the fields like workitemstatus, workitemcloseddate on the Azure boards, then our incremental script will fetch this record but logstash will be straightaway pushing it to the ES index and the record will be duplicate( for workitemnum , WorkitemType)

Is it possible to just update the records without being get duplicated in ES ? can we handle it at logstash level or some sort of trigger job that will just update the record with the new data fields or may be drop previous record and just keep last record .

any pointers will be helpful .

Thanks,
Sachin

Try using a fingerprint filter to set the document id. See this thread.

Thanks Badger, fingerprint worked fine without any issues ..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.