Pipeline workers configuration

ritusingh · February 2, 2021, 7:23am

Hi,
I have reduced pipeline.workers to 1 because I need the output in correct order. Now the problem is when I am giving 10k request (100 per second) it is not reaching elasticsearch, only 9k reaching. Earlier it was not like this. Can someone suggest that is it because of pipeline.workers? How can I ensure better performance along with correct order of output?

Christian_Dahlqvist · February 2, 2021, 8:06am

In order to get everything in the correct order you need to process it all in a single thread, which as you can see dramatically limits throughput. The first question to ask is why you need to maintain the order. If you are unable to relax this, does ordering have to apply to all data? Might it perhaps be possible to partition the data based on some criteria in the data and send it through a number of pipelines?

ritusingh · February 2, 2021, 10:09am

Hi Christian,
This data is for ML Team which includes time-series model as well.
My flow of data is given below:
Filebeat -> logstash -> elasticsearch -> logstash -> csv file[where I need data in proper order with respect to time]
I came across an idea, that I can mention workers through command line instead of logstash.yml. So while exporting data from elasticsearch to csv , I am giving 1 worker for that particular logstash instance. Is this approach better? Or should I try something else?

Christian_Dahlqvist · February 2, 2021, 11:01am

If you are sorting when you read data from Elasticsearch, why do you need to maintain order when writing?

ritusingh · February 2, 2021, 11:05am

It is true that in elasticsearch it will be sorted based on the timestamp, but while writing into csv if 2 workers are present , order is not maintained in the same way as it is in elasticsearch . And also I want to clear that it is the csv file which will be sent to Machine Learning Team .Hence order is required while writing.

Christian_Dahlqvist · February 2, 2021, 11:08am

Is the performance problem related to exporting data from Elasticsearch or indexing into it?

ritusingh · February 2, 2021, 11:09am

indexing into elasticsearch

Christian_Dahlqvist · February 2, 2021, 11:13am

Why do you need to maintain order when writing to Elasticsearch?

I do not understand why this is required given that you can sort when you extract the data. Can you please explain?

ritusingh · February 2, 2021, 11:21am

Let me rephrase the whole scenario.

Filebeat -> logstash(inbound) -> elasticsearch -> logstash(outbound) -> csv file[where I need data in proper order with respect to time]

I need to import bulk data to elasticsearch where order does not matter to me as it will get arranged on the basis of timestamp.
Once my data got indexed, I want to export it to csv. Now in csv i want data to write in either increasing or decreasing order of @timestamp.
But with multiple pipeline workers it is not arranged timewise. My problem is only while exporting.

to solve this problem i reduced pipeline.worker to 1 in logstash.yml. And now I am facing performance issues in indexing as well.

system · March 2, 2021, 11:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Strict execution ordering of Logstash output plugins Logstash	6	494	November 17, 2022
How to process messages strictly in the order they arrive? Logstash	5	5034	July 6, 2017
Logstash maintain order of updates Logstash	2	298	February 28, 2020
How to maintain the order in which records are read Logstash	6	1037	July 26, 2017
File output filter writing logs out of order Logstash	3	1666	July 6, 2017

Pipeline workers configuration

Related topics