Hi,
I am trying to do clustering using 3 nodes with below configuration, but not able to achieve the desirable time required to push the logs to elasticsearch.
Configuration used:
Servers: UBUNTU 16.10
ELK version: 5.3.0
8GB RAM and 4 CPU of 2 servers and 4GB RAM 4 CPU of 1 server.
File size used for testing: 1.1 GB
Configuration file:
input {
stdin { }
}
filter {}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => [ "x.x.x.x", "y.y.y.y", "z.z.z.z" ]
index => "testing"
}
}
Following are the results of the testing done on individual servers with and without fine tuning:
Case 1: No ES Tuning
{
"testing": {
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "5",
"provided_name": "testing",
"creation_date": "1486813906020",
"number_of_replicas": "1",
"uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
"version": {
"created": "5010199"
}
}
}
}
}
# time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf
RESULT:
real 25m50.425s
user 15m40.448s
sys 1m5.580s
Case 2.0: With ES Tuning (increased buffer size to 40% and mlock set to true) JAVA HEAP Size = 4GB and index_refresh_interval=30s
{
"testing": {
"settings": {
"index": {
"refresh_interval": "30s",
"number_of_shards": "5",
"provided_name": "testing",
"creation_date": "1486813906020",
"number_of_replicas": "1",
"uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
"version": {
"created": "5010199"
}
}
}
}
}
# time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf
RESULT:
real 19m56.375s
user 13m30.492s
sys 0m53.648s
Case 2.1: With 60s refresh interval and no replicas and increased 8 worker thread and bulk size to 100000
{
"testing": {
"settings": {
"index": {
"refresh_interval": "60s",
"number_of_shards": "5",
"provided_name": "testing",
"creation_date": "1486813906020",
"number_of_replicas": "0",
"uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
"version": {
"created": "5010199"
}
}
}
}
}
# time cat file.txt | /usr/share/logstash/bin/logstash -w 8 -b 100000 -f test.conf
RESULT:
real 16m12.258s
user 10m9.192s
sys 0m34.708s
Following are the results of the testing done with three servers in clustering:
1 server configured as dedicated Master Node and 2 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 13 mins (approx)
2 servers configured as dedicated Master Node and 1 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 12.41 mins (approx)
All 3 servers configured as dedicated Master Node with workers as 16, batch size of 10000 and heapsize of 4GB required time 15.38 mins (approx)
With the above test cases I am not able to achieve the desirable time reduction for forwarding the logs to Elasticsearch, can someone please let me know what additional changes in the configuration needs to be done to achieve the same.