Slow Data loading to elasticsearch

gowtham.go · June 13, 2017, 8:05pm

Hi,

I am new to Elasticsearch. I am trying to create a reporting application with Elastic search as Data Store. I get Input files, try and index it into ES via Logstash, and search / filter in ES for report output. The Issue I'm facing is, the data loading is very slow. (10k docs per minute where as in some tutorials i saw 20k apache docs per second) Obviosly im missing something here and kindly help me catch up and make the processing faster.

My system is of 8 GB RAM and 4 core processor. JVM Heap configured to 2GB for both Logstash & ES. The input file contains 30 Mil docs (Pipe seperated) and of 3.5 GB file size.Data is getting loaded to default 5 shards. It uses 4 workers(Got it with the use of metrics). The Logstash config is like the below one.

 input {
      file {
        path => "D:/aaa/sample.txt"
        type => "test"
        start_position => "beginning"
    	sincedb_path => "D:/aaa/bbb/null"    
      }
    }
    filter {
      csv {
          separator => "|"
          columns => ["1","2","3","4","5","6","7","8","9"]
      }
    }
    output {
    	elasticsearch {
            action => "index"
            hosts => [ "localhost:9200" ]
            index => "sampleindex"
        }
        stdout {codec => dots}
    }

What might be an ideal way to improve performance? I'm thinking of loading the above said file with 30 Mil records in few minutes (10 to 15).

Thanks in Advance,
Gowtham

Kryten · June 13, 2017, 8:09pm

Lots of potential things you could do, but before trying anything else remove the:

stdout {codec => dots}

and retry. stdout has always been a bottleneck for me.

gowtham.go · June 14, 2017, 6:49am

The speed improved to 12,000 docs per minute from 10,000 docs per minute. Not that much of an impact by removing stdout. Is there any other major thing I'm missing out here ?

Christian_Dahlqvist · June 14, 2017, 6:55am

Which version of Elasticsearch and Logstash are you using? What is the average size of a record? What is resource utilisation looking like on the node during indexing, specifically around CPU usage and disk IO?

gowtham.go · June 14, 2017, 7:29am

The version of Logstash & Elastic Search is 5.4.1. A sample record from the file is below

A|ABC Enterprise Customer|10|10010|111000123456780001001353|ACDFTR|000|TT|2017-02-28

The Memory usage is around 7 GB and CPU utilization is less than 35% most of the time.

Christian_Dahlqvist · June 14, 2017, 8:25am

Is this a VM or a bare-metal server?

gowtham.go · June 14, 2017, 9:32am

Its a Laptop with Windows 7 OS.

Christian_Dahlqvist · June 14, 2017, 9:40am

Can you show the graph covering disk IO in greater detail?

gowtham.go · June 14, 2017, 1:53pm

I'm not able to get the graph detail. But the graph is always at its peak except a sudden down as shown in the image above.

Christian_Dahlqvist · June 14, 2017, 2:22pm

Does that indicate that performance is limited by disk I/O?

gowtham.go · June 14, 2017, 4:43pm

Do you mean to say that the Disk IO rate for the machine is slow which inturn affects the ES performance ? If that is so, I am able to load a tad faster in SSIS ETL tool by Microsoft in the very same machine. (1 million records in 3.5 mins)

Christian_Dahlqvist · June 14, 2017, 4:48pm

Does SSIS ETL also index into the same Elasticsearch instance?

gowtham.go · June 14, 2017, 5:23pm

No, It loads Data into SQL Server Table

gowtham.go · June 15, 2017, 1:29pm

Guys, I am struggling to find out an solution. Please someone help me figure out the low performance issue of Logstash.

Christian_Dahlqvist · June 15, 2017, 1:47pm

Is the SQL Server database also on your laptop? Have you increased the refresh interval on the index you are indexing into? You can also try to increase the pipeline batch-size, e.g. to 1000, to see if this makes a difference.

system · July 13, 2017, 1:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improve performance of Logstash data loading into ES Logstash	16	4237	July 18, 2017
Elasticsearch/Logstash low indexing rate Elasticsearch	6	451	July 13, 2020
Data loading into ES index is very slow Logstash	13	4972	February 16, 2017
Tuning logstash and elasticsearch for loading data from oracle database Elasticsearch	13	2179	September 17, 2019
Improve performance of logstash data loading into elastic search Logstash docker	2	739	March 18, 2020

Slow Data loading to elasticsearch

Related topics