Hi...
I'm using logstash 5.6.13.
I'm trying to export data from a CSV file (Size 975mb and contain 1 crore fields) to Mongodb using logstash. My system configuration in 16GB RAM and 320GB HDD. System have core i7 processor. The logstash takes 15 min for exporting data from CSV file to MongoDB, and the database size will be 2.5GB.
I need to export this file to mongodb within 4 min.
How to Increase Performance Speed of Logstash for using MongoDB.
It generally helps if you show your config. Assume you are using the MongoDB output plugin (which I have never used), what attempts at tuning this have you gone through? What throughput are you seeing?
If you provide this type of information it may be easier for someone who have used this to help.
I am not able to see the config as the attachment is missing. Maybe you can upload it as a gist and share the link to it here instead?
The first thing I would recommend is to determine whether the Mongodb output plugin is the bottleneck. The easiest way to do this is probably to comment it out and add a fast output, e.g. a simple file output (if you have fast storage). If throughput with this other output increases, it is likely not the pipeline that is the limiting factor, and you can then start optimising the Mongodb output.
If you still see the same throughput, I would look into the pipeline configuration and see if it can be made faster. I have seen reports that using the direct filter to parse CSV files can be faster than using the csv filter or grok, so that might be something worth trying.
The config looks simple, so I do not see much improvement to be done to the filters. It may be the file input plugin or the performance of the storage used that is limiting performance. What kind of storage do you have? What does disk I/O and iowait look like? Does it make any difference if you split the file into two and configure 2 separate file inputs?
In this site, i saw a question which is related to logstash performance.
which says that the system have RAM- 224 GB , HD- 30 GB and Core - 32. There for only 10 min needed for exporting 11 GB of data. My system takes 15 min for processing 918MB pf data. I think, the speed is related to the system configuration?? Is it right sir?? Is there any connection between system configuration and logstash for performance???
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.