How to Increase Performance Speed of Logstash for using MongoDB

Hi...
I'm using logstash 5.6.13.
I'm trying to export data from a CSV file (Size 975mb and contain 1 crore fields) to Mongodb using logstash. My system configuration in 16GB RAM and 320GB HDD. System have core i7 processor. The logstash takes 15 min for exporting data from CSV file to MongoDB, and the database size will be 2.5GB.

I need to export this file to mongodb within 4 min.

How to Increase Performance Speed of Logstash for using MongoDB.

It generally helps if you show your config. Assume you are using the MongoDB output plugin (which I have never used), what attempts at tuning this have you gone through? What throughput are you seeing?

If you provide this type of information it may be easier for someone who have used this to help.

The attached one is my configuration file.
Sytem Spec: -

OS: Ubuntu

RAM: 16GB

Processor: i7

Xms- 2g

Xmx- 12g

**Output: **

Total 15 min for converting a CSV file (size 910MB having 1 crore documents) to MongoDB.

10000 documents per seconds

I need to improve the performance speed.

i need to export atlest 25000 documents per second.

(Attachment cdrconf.conf is missing)

I am not able to see the config as the attachment is missing. Maybe you can upload it as a gist and share the link to it here instead?

The first thing I would recommend is to determine whether the Mongodb output plugin is the bottleneck. The easiest way to do this is probably to comment it out and add a fast output, e.g. a simple file output (if you have fast storage). If throughput with this other output increases, it is likely not the pipeline that is the limiting factor, and you can then start optimising the Mongodb output.

If you still see the same throughput, I would look into the pipeline configuration and see if it can be made faster. I have seen reports that using the direct filter to parse CSV files can be faster than using the csv filter or grok, so that might be something worth trying.

This is My Configuration File

input

{

file

{

	path => "/home/unni/Downloads/logstash-5.6.13/File/CDR.csv"

	start_position => "beginning"

	sincedb_path => "/dev/null"

}

}

filter

{

csv

{

	separator => ","

	columns => [ "Phone Number","Called Number", "IMEI", "IMSI","Call_Type", "Date", "Time", "SlNo", "CDR"]

}

if [Call_Type] == "1"

{

	mutate

	{

		replace => { "Call_Type" => "Incoming Call" }

	}

}

else if [Call_Type] == "2"

{

	mutate

	{

		replace => { "Call_Type" => "Outgoing Call" }

	}

}

else if [Call_Type] == "3"

{

	mutate

	{

		replace => { "Call_Type" => "Incoming SMS" }

	}

}

else if [Call_Type] == "4"

{

	mutate

	{

		replace => { "Call_Type" => "Outgoing SMS" }

	}

}

mutate

{

	remove_field => ["message", "path", "host", "Date", "Time","%{@timestamp}","%{@host}"]

	add_field => {"DateTime" => "%{Date} %{Time}"}

}

}

output

{

mongodb

 {

 	bulk => "true"

 	bulk_size => "1000"

   	collection => "E"

   	database => "CDR"

   	uri => "mongodb://localhost:27017"

	codec => "json"

}

}

Did you test the throughput without the Mongodb plugin?

Yes sir...
There is no change for speed without using mongodb plugin.

To convert the same csv file to another csv file, there is the same time (15min) taken for the process.

Is the performance speed of logstash is depends on the System configuration??? Or any other factors???

The config looks simple, so I do not see much improvement to be done to the filters. It may be the file input plugin or the performance of the storage used that is limiting performance. What kind of storage do you have? What does disk I/O and iowait look like? Does it make any difference if you split the file into two and configure 2 separate file inputs?

In this site, i saw a question which is related to logstash performance.

which says that the system have RAM- 224 GB , HD- 30 GB and Core - 32. There for only 10 min needed for exporting 11 GB of data. My system takes 15 min for processing 918MB pf data. I think, the speed is related to the system configuration?? Is it right sir?? Is there any connection between system configuration and logstash for performance???

Did you look at disk I/O and/or try two separate file inputs?

Yes.. I tried that already. But there is no change for the performance speed. Speed remains the same.

What did disk I/O look like? Could disk speed be the bottleneck? What type of disk/storage do you have?

My system does not have a SSD Disk. It is a SATA Disk.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.