How to Increase Performance Speed of Logstash for using MongoDB

urcreations · January 10, 2019, 9:58am

Hi...
I'm using logstash 5.6.13.
I'm trying to export data from a CSV file (Size 975mb and contain 1 crore fields) to Mongodb using logstash. My system configuration in 16GB RAM and 320GB HDD. System have core i7 processor. The logstash takes 15 min for exporting data from CSV file to MongoDB, and the database size will be 2.5GB.

I need to export this file to mongodb within 4 min.

How to Increase Performance Speed of Logstash for using MongoDB.

Christian_Dahlqvist · January 13, 2019, 12:37pm

It generally helps if you show your config. Assume you are using the MongoDB output plugin (which I have never used), what attempts at tuning this have you gone through? What throughput are you seeing?

If you provide this type of information it may be easier for someone who have used this to help.

urcreations · January 14, 2019, 4:26am

The attached one is my configuration file.
Sytem Spec: -

OS: Ubuntu

RAM: 16GB

Processor: i7

Xms- 2g

Xmx- 12g

**Output: **

Total 15 min for converting a CSV file (size 910MB having 1 crore documents) to MongoDB.

10000 documents per seconds

I need to improve the performance speed.

i need to export atlest 25000 documents per second.

(Attachment cdrconf.conf is missing)

Christian_Dahlqvist · January 14, 2019, 6:22am

I am not able to see the config as the attachment is missing. Maybe you can upload it as a gist and share the link to it here instead?

The first thing I would recommend is to determine whether the Mongodb output plugin is the bottleneck. The easiest way to do this is probably to comment it out and add a fast output, e.g. a simple file output (if you have fast storage). If throughput with this other output increases, it is likely not the pipeline that is the limiting factor, and you can then start optimising the Mongodb output.

If you still see the same throughput, I would look into the pipeline configuration and see if it can be made faster. I have seen reports that using the direct filter to parse CSV files can be faster than using the csv filter or grok, so that might be something worth trying.

urcreations · January 15, 2019, 6:48am

This is My Configuration File

input

{

file

{

	path => "/home/unni/Downloads/logstash-5.6.13/File/CDR.csv"

	start_position => "beginning"

	sincedb_path => "/dev/null"

}

}

filter

{

csv

{

	separator => ","

	columns => [ "Phone Number","Called Number", "IMEI", "IMSI","Call_Type", "Date", "Time", "SlNo", "CDR"]

}

if [Call_Type] == "1"

{

	mutate

	{

		replace => { "Call_Type" => "Incoming Call" }

	}

}

else if [Call_Type] == "2"

{

	mutate

	{

		replace => { "Call_Type" => "Outgoing Call" }

	}

}

else if [Call_Type] == "3"

{

	mutate

	{

		replace => { "Call_Type" => "Incoming SMS" }

	}

}

else if [Call_Type] == "4"

{

	mutate

	{

		replace => { "Call_Type" => "Outgoing SMS" }

	}

}

mutate

{

	remove_field => ["message", "path", "host", "Date", "Time","%{@timestamp}","%{@host}"]

	add_field => {"DateTime" => "%{Date} %{Time}"}

}

}

output

{

mongodb

 {

 	bulk => "true"

 	bulk_size => "1000"

   	collection => "E"

   	database => "CDR"

   	uri => "mongodb://localhost:27017"

	codec => "json"

}

}

Christian_Dahlqvist · January 15, 2019, 9:07am

Did you test the throughput without the Mongodb plugin?

urcreations · January 15, 2019, 9:57am

Yes sir...
There is no change for speed without using mongodb plugin.

To convert the same csv file to another csv file, there is the same time (15min) taken for the process.

Is the performance speed of logstash is depends on the System configuration??? Or any other factors???

Christian_Dahlqvist · January 15, 2019, 10:30am

The config looks simple, so I do not see much improvement to be done to the filters. It may be the file input plugin or the performance of the storage used that is limiting performance. What kind of storage do you have? What does disk I/O and iowait look like? Does it make any difference if you split the file into two and configure 2 separate file inputs?

urcreations · January 15, 2019, 10:47am

In this site, i saw a question which is related to logstash performance.

which says that the system have RAM- 224 GB , HD- 30 GB and Core - 32. There for only 10 min needed for exporting 11 GB of data. My system takes 15 min for processing 918MB pf data. I think, the speed is related to the system configuration?? Is it right sir?? Is there any connection between system configuration and logstash for performance???

Christian_Dahlqvist · January 15, 2019, 11:00am

Did you look at disk I/O and/or try two separate file inputs?

urcreations · January 15, 2019, 11:29am

Yes.. I tried that already. But there is no change for the performance speed. Speed remains the same.

Christian_Dahlqvist · January 15, 2019, 11:36am

What did disk I/O look like? Could disk speed be the bottleneck? What type of disk/storage do you have?

urcreations · January 16, 2019, 11:09am

My system does not have a SSD Disk. It is a SATA Disk.

system · February 13, 2019, 11:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setup, configure Logstash for collecting peromance issues for MongoDB Logstash	1	313	April 26, 2017
Logstash-input-mongodb plugin issues Logstash	2	970	July 6, 2017
Logstash improve time performance Logstash	13	708	April 12, 2018
Logstash 8.6 low performance Logstash	9	402	November 22, 2023
Logstash Peformance Logstash	35	3296	May 19, 2017

How to Increase Performance Speed of Logstash for using MongoDB

Related topics