I setup a pipeline in logstash for getting data from sql db by jdbc connector and insert into elastic search index on daily basis. I setup cron time and sync pipeline with sql db on a date field.
My problem is:
How to get missing data which is not inserting into index. Like 20 new rows inserted or updated in sql db but when pipeline run on its time, pipeline insert 15 rows data. 5 rows data not inserted may be incorrect field value or other reason.
So how to get these missing data.
I guess you should try to write additional output in logstash conf file to see if data is being dropped while being pulled by jdbc connector or its dropped while trying to ingest to elasticsearch.
If its the case of data being dropped while ingesting to elasticsearch, you can try sending such data to failover index (with help of if tags present condition)
or try to fix the error while ingesting data to Elasticsearch.
Yes it is data dropping case while ingesting to elasticsearch.
Problem is how to implement if condition in output part of conf file.
If some data(sql row) is not insert than go to failover index.
If you have any reference or link please share me.
Here is my conf file:
Cool, for now I would suggest to see and find the reason why elasticsearch rejected ingesting specific record. You may try that with journalctl -fu logstash.service post starting the service or look at your rubydebug output.
The most general reason for this is the data type mismatch, you can fix it by updating index mapping in elasticsearch
or while you know the reason, you can write condition for the scenario to route data to failover index.
I experienced similar exception while I was trying to ingest string data to number format field. I just updated index mapping and reloaded data to fix it.
You are right it is data type mismatch. I checked with journalctl command.
Multiple fields are coming with data type mismatch.
I am trying to handle data type mismatch to a different index.
One way to implement multiple filter condition or user dead letter queue for missing data. link [Dead Letter Queues (DLQ) | Logstash Reference [7.12] | Elastic](Dead Letter Queues (DLQ) | Logstash Reference [7.12] | Elastic
Which approach is better meanwhile data is missing due to data type mismatch or some other reason.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.