Duplicate Issue - document_id, how to prevent overwriting of entries

Sampson_Light · February 16, 2021, 10:09am

Hi All,

Some background information:

I have duplicate entries in my elasticsearch indexes.

Have used document_id which prevented duplicates from appearing. But the issue with this is that it overwrites and updates the duplicate > effectively removing the older copy which is 'correct'.

Can anyone point out if there is anyway to prevent the overwriting/updating from happening, and instead just tell ElasticSearch to ignore the duplicate that was detected from document_id?

Wolfram_Haussig · February 16, 2021, 10:44am

Hi,

Elasticsearch has a parameter called op_type: Index API | Elasticsearch Reference [7.11] | Elastic

Set this parameter to create to ingest only documents if the Id does not exist.

Best regards
Wolfram

Sampson_Light · February 16, 2021, 12:10pm

Hi Wolfram,

Appreciate your response.

If you don't mind could you point out to me how I can apply your recommendation?

For example, am I supposed to edit a certain config file from somewhere? Or am I suppose to do something in the same logstash.conf file.

Regards
Sam

Wolfram_Haussig · February 16, 2021, 12:25pm

Hello Sam,

This depends how you are ingesting the data:
When ingesting directly to ElasticSearch op_type is an url parameter:

PUT my-index-000001/_doc/1?op_type=create

When using LogStash you can configure it in the elasticsearch output:

 output {
      elasticsearch {
        action => "create"
      }
    }

Best regards
Wolfram

Sampson_Light · February 16, 2021, 1:35pm

Hi Wolfram,

Apologies as I have forgotten to mention how I am using the data.

Yes, I am ingesting in via Logstash:

From the example you shown me, all I need to do is to add the actions => "create" line into my conf file.

After which Logstash will not index and overwrite documents which already have an ID > if I were to simulate a duplicate this time round, nothing will happen (unlike before whereby documents were overwritten).

Am interpreting it right?

Regards
Sam

Wolfram_Haussig · February 16, 2021, 1:44pm

Hello Sam,

Yes, adding action => "create" should solve your issue.

Best regards
Wolfram

system · March 16, 2021, 1:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to stop duplicate entries using elasticsearch plugin Logstash	10	6254	June 29, 2017
Logstash - how to overwrite document instead of creating new ones Logstash	6	5223	August 18, 2019
Removing Duplicate documents in ElasticSearch Elasticsearch	2	380	June 11, 2019
Elasticsearch documents getting updated with same document id Logstash	2	439	September 17, 2018
Logstash generating duplicated index Logstash	1	475	September 5, 2017

Duplicate Issue - document_id, how to prevent overwriting of entries

Related topics