I have reduced pipeline.workers to 1 because I need the output in correct order. Now the problem is when I am giving 10k request (100 per second) it is not reaching elasticsearch, only 9k reaching. Earlier it was not like this. Can someone suggest that is it because of pipeline.workers? How can I ensure better performance along with correct order of output?
In order to get everything in the correct order you need to process it all in a single thread, which as you can see dramatically limits throughput. The first question to ask is why you need to maintain the order. If you are unable to relax this, does ordering have to apply to all data? Might it perhaps be possible to partition the data based on some criteria in the data and send it through a number of pipelines?
This data is for ML Team which includes time-series model as well.
My flow of data is given below:
Filebeat -> logstash -> elasticsearch -> logstash -> csv file[where I need data in proper order with respect to time]
I came across an idea, that I can mention workers through command line instead of logstash.yml. So while exporting data from elasticsearch to csv , I am giving 1 worker for that particular logstash instance. Is this approach better? Or should I try something else?
If you are sorting when you read data from Elasticsearch, why do you need to maintain order when writing?
It is true that in elasticsearch it will be sorted based on the timestamp, but while writing into csv if 2 workers are present , order is not maintained in the same way as it is in elasticsearch . And also I want to clear that it is the csv file which will be sent to Machine Learning Team .Hence order is required while writing.
Is the performance problem related to exporting data from Elasticsearch or indexing into it?
indexing into elasticsearch
Why do you need to maintain order when writing to Elasticsearch?
I do not understand why this is required given that you can sort when you extract the data. Can you please explain?
Let me rephrase the whole scenario.
Filebeat -> logstash(inbound) -> elasticsearch -> logstash(outbound) -> csv file[where I need data in proper order with respect to time]
I need to import bulk data to elasticsearch where order does not matter to me as it will get arranged on the basis of timestamp.
Once my data got indexed, I want to export it to csv. Now in csv i want data to write in either increasing or decreasing order of @timestamp.
But with multiple pipeline workers it is not arranged timewise. My problem is only while exporting.
to solve this problem i reduced pipeline.worker to 1 in logstash.yml. And now I am facing performance issues in indexing as well.