How can I debug what influence the write-in speed from logstash:input:file to es?

Morriaty · November 25, 2016, 10:18am

Hi

I found it very slow when writing to es.
And I've made a test

input {
   file  {
      path => "access.log"
   }
}
output {
    elasticsearch {
        index => "ur_test_%{+YYYY.MM.dd}"
        
        document_type => "k_gif"
        flush_size => 1000
        idle_flush_time => 3
        hosts => ["10.10.22.13:9200"]
    }
    
}

# my server memory is limited, so I set the workers to 4
./logstash -f test.conf -w 4

The raw log write-in speed is near 110 lines per second. But the es index increase about only 50 docs per 3 seconds.

Christian_Dahlqvist · November 25, 2016, 10:35am

What versions of Logstash and Elasticsearch are you using? What is the specification of the hosts running Logstash and Elasticsearch?

Morriaty · November 25, 2016, 10:54am

Hi @Christian_Dahlqvist

I am using logstash-2.3.1 and es-2.3.2. The hosts are in the same datacenter, but different vlans, from 10.10.20.0/255.255.255.0 to 10.10.22.0/255.255.255.0.

I suppose it's not network problem.

Christian_Dahlqvist · November 25, 2016, 10:55am

How much CPU and RAM do the hosts have? What type of storage are you using? How are you measuring indexing throughput?

Morriaty · November 25, 2016, 11:05am

The Logstash host have 16G memory and 16 cores AMD 6128, but the free memory is about only 3G.

I'm not clear with the es cluster's hardware. But there are many other logs writing to es, it is the cluster operation engineer told me that my log write-in speed was much slower than others.

Christian_Dahlqvist · November 25, 2016, 11:15am

I would recommend upgrading to Logstash 2.4 and try increasing the number of elasticsearch output workers to 4 (the same number as your number of worker threads).

If you do not have access to the Elasticsearch cluster to measure throughput, you can try introducing the metrics filter and write this to e.g. a file using conditionals.

Morriaty · November 28, 2016, 10:46am

This is the test.conf

input {
      generator {
        type => "generated"
      }
    }

    filter {
      if [type] == "generated" {
        metrics {
          meter => "events"
          add_tag => "metric"
        }
      }
    }

    output {
      
      if "metric" in [tags] {
        stdout {
          codec => line {
            format => "rate: %{[events][rate_1m]}"
          }
        }
      }

    elasticsearch {
          index => "ur_test_%{+YYYY.MM.dd}"
          document_type => "kkk"
          hosts => ["10.10.22.13:8200"]
          workers => 16
        }
    }

If I test without output to es, the rate is about 20000/min

$ ./logstash -f conf/test.conf 
Settings: Default pipeline workers: 16
Pipeline main started
rate: 0.0
rate: 18273.2
rate: 18615.553825440165
rate: 19008.827059558338
rate: 19918.175759533293
rate: 20440.175732344997
rate: 21039.14095380837
rate: 21438.843646149202
rate: 21849.22019386933
rate: 22468.25071231354
rate: 22695.56038678966
rate: 22992.998331666

If writing to es, the rate decreased deeply to 300/min

$ ./logstash -f conf/test.conf
Settings: Default pipeline workers: 16
Pipeline main started
rate: 399.4
rate: 383.45685627708707
rate: 364.7906766746546
rate: 351.6147416575027
rate: 331.4967377003855

And I've checked the network

$ ping 10.10.22.13
PING 10.10.22.13 (10.10.22.13) 56(84) bytes of data.
64 bytes from 10.10.22.13: icmp_seq=1 ttl=63 time=0.133 ms
64 bytes from 10.10.22.13: icmp_seq=2 ttl=63 time=0.104 ms
64 bytes from 10.10.22.13: icmp_seq=3 ttl=63 time=0.123 ms
64 bytes from 10.10.22.13: icmp_seq=4 ttl=63 time=0.107 ms
64 bytes from 10.10.22.13: icmp_seq=5 ttl=63 time=0.099 ms
64 bytes from 10.10.22.13: icmp_seq=6 ttl=63 time=0.232 ms
64 bytes from 10.10.22.13: icmp_seq=7 ttl=63 time=0.121 ms
64 bytes from 10.10.22.13: icmp_seq=8 ttl=63 time=0.201 ms
64 bytes from 10.10.22.13: icmp_seq=9 ttl=63 time=0.100 ms
64 bytes from 10.10.22.13: icmp_seq=10 ttl=63 time=0.105 ms
64 bytes from 10.10.22.13: icmp_seq=11 ttl=63 time=0.324 ms

And the cluster op engineer have told me the es cluster's average rate is 8000/sec

Christian_Dahlqvist · November 28, 2016, 10:56am

I believe the metrics shown are number of documents processed per second measured over a period of 1 minute, not documents per minute. Based on the information you have provided it would seem it is possible that Elasticsearch is the bottleneck. The fact that it is processing 8000 events/second does not necessarily mean it has any spare capacity to process additional events.

system · December 26, 2016, 10:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow writing to Elasticsearch Elasticsearch	2	4969	May 24, 2018
How to improve the speed of Logstash output to Elasticsearch? Logstash	8	1151	December 1, 2019
Elasticsearch/Logstash low indexing rate Elasticsearch	6	486	July 13, 2020
Logstash performance for indexing to ES Logstash	6	671	April 18, 2017
How to tune Logstash for ES indexing speed Logstash	3	1476	October 4, 2018

How can I debug what influence the write-in speed from logstash:input:file to es?

Related topics