Records missing from Elastic Index when running our Talend ETL process

BroJo · October 20, 2022, 8:57am

Hi there,
In our application we decided to use elasticsearch create a daily snapshot of some critical application data for visualizations.

The issue we are facing is that we are getting missing records in out etl process - for some reason not all the data we are uploading to elastic gets there.

we have been running tests with various settings and thus far it seems the following settings works best and does not produce missing records
bulk indexing using curl in java code [talend]
refresh interval disabled
Number of Shards: 2
Replicas: 0
5 requests at a time of 300 documents in each request for bulk index (seems to work best)

enabling refresh interval to 30s and setting document count to 100 we would have missign records from elastic index

We wrote a talend job to retried the data from line of business system and user curl inside talend to do bulk inserts of documents to elasticsearch.

can someone shed some light on why we are seeing missing records when chagning the above settings - we do not get any errors though out the process and all the files
are being processed

the highlightes green rows are the only settings that produced no missing records during our test

split rows column is document count in each request for bulk insert

System information:

Elasticsearch deployed in Azure Kubernetes Services.
Nodepool made up of 3 nodes of VM type: Standard_B12ms [12 vcpus and 48 memory].
K8s resource: statefulset, 3 node cluster, version 8.1.0. docker image
Pod cpu limit: 8 cpus
Pod memory limit: 16Gi
jvm settings: -Xms8g -Xmx8g

The talend job runs in as a cronjob in the same cluster

can someone shed some light on why we are seeing missing records - we do not get any errors though out the process and all the files
are being processed

system · November 17, 2022, 8:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Missing documents during bulk insert Elasticsearch	5	912	July 21, 2022
Elasticsearch bulk index missing some records Elasticsearch	18	3907	August 2, 2018
Bulk API Insert Data missing Elasticsearch language-clients	4	1619	October 18, 2021
Elasticsearch Data refresh + miss issue Elasticsearch	1	715	March 30, 2020
Document missing when processing bulk operation Elasticsearch	3	599	May 18, 2018

Records missing from Elastic Index when running our Talend ETL process

Related topics