Clustering with three nodes

Hi,

I am trying to do clustering using 3 nodes with below configuration, but not able to achieve the desirable time required to push the logs to elasticsearch.

Configuration used:

Servers: UBUNTU 16.10
ELK version: 5.3.0
8GB RAM and 4 CPU of 2 servers and 4GB RAM 4 CPU of 1 server.
File size used for testing: 1.1 GB

Configuration file:

input {
           stdin { }
           }

filter {}

output {

	stdout { codec => rubydebug }
 	elasticsearch {
 	hosts => [ "x.x.x.x", "y.y.y.y", "z.z.z.z" ]
 	index => "testing"
			         }
	      }

Following are the results of the testing done on individual servers with and without fine tuning:

Case 1: No ES Tuning

{
  "testing": {
    "settings": {
      "index": {
        "refresh_interval": "1s",
        "number_of_shards": "5",
        "provided_name": "testing",
        "creation_date": "1486813906020",
        "number_of_replicas": "1",
        "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
        "version": {
          "created": "5010199"
        }
      }
    }
  }
}

# time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf

RESULT:
real	25m50.425s
user	15m40.448s
sys	1m5.580s

Case 2.0: With ES Tuning (increased buffer size to 40% and mlock set to true) JAVA HEAP Size = 4GB and index_refresh_interval=30s

   {
      "testing": {
        "settings": {
          "index": {
            "refresh_interval": "30s",
            "number_of_shards": "5",
            "provided_name": "testing",
            "creation_date": "1486813906020",
            "number_of_replicas": "1",
            "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
            "version": {
              "created": "5010199"
            }
          }
        }
      }
    }

    # time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf

    RESULT:
    real	19m56.375s
    user	13m30.492s
    sys	0m53.648s

Case 2.1: With 60s refresh interval and no replicas and increased 8 worker thread and bulk size to 100000

{
  "testing": {
    "settings": {
      "index": {
        "refresh_interval": "60s",
        "number_of_shards": "5",
        "provided_name": "testing",
        "creation_date": "1486813906020",
        "number_of_replicas": "0",
        "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
        "version": {
          "created": "5010199"
        }
      }
    }
  }
}

# time cat file.txt | /usr/share/logstash/bin/logstash -w 8 -b 100000 -f test.conf

RESULT:
real	16m12.258s
user	10m9.192s
sys	0m34.708s

Following are the results of the testing done with three servers in clustering:

1 server configured as dedicated Master Node and 2 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 13 mins (approx)

2 servers configured as dedicated Master Node and 1 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 12.41 mins (approx)

All 3 servers configured as dedicated Master Node with workers as 16, batch size of 10000 and heapsize of 4GB required time 15.38 mins (approx)

With the above test cases I am not able to achieve the desirable time reduction for forwarding the logs to Elasticsearch, can someone please let me know what additional changes in the configuration needs to be done to achieve the same.

1 Like

Can you update your post to use code formatting where appropriate? It's really hard to read as is :slight_smile:

One thing that you can do is disable the refresh interval, refresh_interval: -1 in elasticsearch.yml
and re-enable again after all the data is copied to Elasticsearch.

What's the desirable time?

Don't set that in the config file, use the APIs.

For single server, using refresh interval of 60 sec (Case 2.1) its taking 16 min (approx).So with three servers I am expecting the time to be around 8-10 mins.

I have tried by changing the settings of the particular index from Devtools using curl command, can you please suggest some other changes which can give me the desirable output.

Once data is reliably being written to Elasticsearch I would recommend removing the stdout output plugin with the debug output. Have you monitored the different nodes during indexing to identify what the bottleneck is? Is it Elasticsearch or Logstash?

You have also not provided the filter section of your Logstash config, so it is hard for us to tell ifs there are optimisations to be made there.

Thanks @warkolm.

During indexing logstash was taking approx 125% -150% of cpu and 4% - 5 % of memory, whereas elasticsearch was taking 50% of memory (approx) and 30% - 40% of cpu.

I have not applied any filter in the configuration file, below is my conf file:

input {
stdin { }
}
filter {}
output {

stdout { codec => rubydebug }
elasticsearch {
hosts => [ "x.x.x.x", "y.y.y.y", "z.z.z.z" ]
index => "testing"
}
}

Can you please help me find the exact issue as I am struggling through it since long time.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.