ElasticSearch losing documents

apelk · February 9, 2023, 6:30pm

Using ELK 7.8.
We have a Logstash pipeline from JDBC database to Elasticsearch.
Using persisted queues and DLQ.
We lose about 1% of the documents we send to Elasticsearch.
We have enabled TRACE on Elasticsearch and the missing documents show up in the log, processing seems similar to other documents that get inserted, we get no errors but the documents are not there when we try to fetch them.
For testing purposes, we have also configured two Logstash outputs to two different indexes in the same pipeline and they both miss the same documents.

Christian_Dahlqvist · February 9, 2023, 7:11pm

Are you assigning document IDs yourself? If so, is there any possibility there could be duplicates? What does your Elasticsearch output plugin config look like?

If you are assigning IDs yourself, what happens if you change one of the outputs to not do so and let Elasticsearch set an ID instead?

apelk · February 9, 2023, 7:25pm

output {
elasticsearch {
index => "XXX"
hosts => "XXX"
user => "XXX"
password => "XXX"
ssl => true
ssl_certificate_verification => true
cacert => "XXX"
document_id => "%{element_id}"
action => "index"
ilm_enabled => false
}
}

The id we use is the primary key of the document in the relational database. We need this to support updates and deletes.

Christian_Dahlqvist · February 9, 2023, 7:27pm

Create another output where you do not set the ID, writing to a different index. Index data and compare the document count. That should indicate whether it is an ID issue or not.

Do you see anything in the DLQ?

Are the mappings of the indices set though index template or generated automatically?

warkolm · February 9, 2023, 10:47pm

Welcome to our community!

FWIW 7.8 is very much EOL and no longer supported, you should be looking to upgrade as a matter of urgency.

apelk · February 10, 2023, 1:03am

I created another destination that uses auto generated ids. The documents are still missing from that index too.
Nothing is in the DLQ.
Mapping was generated automatically, but if I resend these missing documents, the second time around they get inserted - so this does not look like a data or mapping issue.

warkolm · February 10, 2023, 2:27am

Do you have a non-zero deleted document count against your indices?

Christian_Dahlqvist · February 10, 2023, 6:13am

If exactly the same documents are missing from two indices where one uses autogenerated IDs and the documents can be inserted later without mapping conflicts, I do not think the issue necessarily is in Elasticsearch. If you add another output and write the unique IDs to a file you should be able to verify that Logstash actually processes all the data.

apelk · February 10, 2023, 6:50am

If the problem is not in Elasticsearch where else would it be? We are losing about 1% of our documents. Their ids show up in the Elasticsearch trace which means that Elasticsearch did some sort of processing for them after it got them from Logstash.

Christian_Dahlqvist · February 10, 2023, 6:54am

Missed that.

Did you perform a refresh on the index before checking that the documents are there?

warkolm · February 10, 2023, 7:50am

A few things;

It's been a long time since Elasticsearch lost data, and then it was very edge case. We've put a lot of work into resilience to prevent this sort of thing from occurring so it is unlikely Elasticsearch is losing them
7.8 is old, you need to upgrade as it is no longer supported
It's highly likely that what Christian is saying is what is happening. This is why I asked about deleted documents, as if one document is overwritten then the original is marked as deleted

apelk · February 10, 2023, 11:59pm

Refresh did not make any difference.
docs.deleted is zero on the index with the auto generated ids. This index is still missing documents however.
docs.deleted is non-zero on the index with the assigned ids, but this was expected since we update and delete documents there.

apelk · February 17, 2023, 4:42am

Actually, upon further checking it looks that the Logstash jdbc input seems to be losing these documents. They don't make it to Elasticsearch as shown by the Elasticsearch trace. If we manually run the sql queries shown in the Logstash log all the documents show up but a tiny fraction of them do not make it to Elasticsearch or the Elasticsearch trace

system · March 17, 2023, 4:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Messages getting lost during insertion Elasticsearch	2	277	July 6, 2017
Document deleted after logstash execution Logstash	5	262	May 28, 2021
How to stop duplicate entries using elasticsearch plugin Logstash	10	6265	June 29, 2017
Documents in elasticsearch getting deleted automatically? Elasticsearch	7	4133	July 5, 2017
Messages getting lost while insertion Elasticsearch	2	358	July 6, 2017

ElasticSearch losing documents

Related topics