Hi,
We're ingesting a big amount of data from an oracle database into elastic through logstash.
We import some oracle table information into different arrays in elastic. Each table corresponding to an array, and each row corresponding to an element of the array.
Sometimes it imports lets say 5 rows from the table information and shows a 5 element array.
Other times it imports those same 5 rows but just shows 2 or 3 array elements in elastic.
This it's a random behavior and it happens with arrays holding different types of information.
Using the stdout plugin I can see that the indeed 5 elements of the array reach the output stage on logstash.
What can be happening so that then they don't all appear on elastic sometimes.
Could it be that they are not sent to elastic at all, or elastic it's not able to index it as it's a big load of information being processed.
I don't see any issue reported on logstash logs.
Do you have an idea on what can be the issue or even if not
how can I check what is reaching elastic and what can I tune on the output plugin or on logstash/elastic config files?
Any hint would be much appreciated as we're running out of options.
Thanks
Please share your logstash configuration and a sample of your message.
Depending on how the data looks like and the mapping of the issues, this could lead to some documents not being indexed, but if Elasticsearch has some problem indexing a document you would get a log line in Logstash log.
You have nothing in Logstash logs?
Can you share an example of how those different types looks like?
Are you using a jdbc filter? That will return each row as a separate event. If you are combining those events into an event that has an array of rows then you will need to run with --pipeline.workers 1, otherwise you could have the rows spread across multiple worker threads.
Thanks Badger. We've pipeline.workers: 5. I've thought about something of that sort, and I've tested with just 1 worker, it get's slow as expected but the main issue is that I was having read timeout on the input plugin with that. Do you have any suggestion to avoid that timeout with just 1 worker?
Here's a sample of a config file that we're using
As @Badger mentioned, you need to use pipeline.workers set to 1 if you are using the aggregate filter, it is required for the aggregate filter to work and yes, one of the drawbacks is that it will make processing slow as you are using only one core.
Can you share a log of this timeout? The number of workers should have some impact on the filter and output block only, but not in the input block.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.