Clustering with three nodes

sayalee · April 17, 2017, 7:24am

Hi,

I am trying to do clustering using 3 nodes with below configuration, but not able to achieve the desirable time required to push the logs to elasticsearch.

Configuration used:

Servers: UBUNTU 16.10
ELK version: 5.3.0
8GB RAM and 4 CPU of 2 servers and 4GB RAM 4 CPU of 1 server.
File size used for testing: 1.1 GB

Configuration file:

input {
           stdin { }
           }

filter {}

output {

	stdout { codec => rubydebug }
 	elasticsearch {
 	hosts => [ "x.x.x.x", "y.y.y.y", "z.z.z.z" ]
 	index => "testing"
			         }
	      }

Following are the results of the testing done on individual servers with and without fine tuning:

Case 1: No ES Tuning

{
  "testing": {
    "settings": {
      "index": {
        "refresh_interval": "1s",
        "number_of_shards": "5",
        "provided_name": "testing",
        "creation_date": "1486813906020",
        "number_of_replicas": "1",
        "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
        "version": {
          "created": "5010199"
        }
      }
    }
  }
}

# time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf

RESULT:
real	25m50.425s
user	15m40.448s
sys	1m5.580s

Case 2.0: With ES Tuning (increased buffer size to 40% and mlock set to true) JAVA HEAP Size = 4GB and index_refresh_interval=30s

   {
      "testing": {
        "settings": {
          "index": {
            "refresh_interval": "30s",
            "number_of_shards": "5",
            "provided_name": "testing",
            "creation_date": "1486813906020",
            "number_of_replicas": "1",
            "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
            "version": {
              "created": "5010199"
            }
          }
        }
      }
    }

    # time cat file.txt | /usr/share/logstash/bin/logstash -w 4 -b 10000 -f test.conf

    RESULT:
    real	19m56.375s
    user	13m30.492s
    sys	0m53.648s

Case 2.1: With 60s refresh interval and no replicas and increased 8 worker thread and bulk size to 100000

{
  "testing": {
    "settings": {
      "index": {
        "refresh_interval": "60s",
        "number_of_shards": "5",
        "provided_name": "testing",
        "creation_date": "1486813906020",
        "number_of_replicas": "0",
        "uuid": "L5yI-HvmT2yTyOJcy-8BIQ",
        "version": {
          "created": "5010199"
        }
      }
    }
  }
}

# time cat file.txt | /usr/share/logstash/bin/logstash -w 8 -b 100000 -f test.conf

RESULT:
real	16m12.258s
user	10m9.192s
sys	0m34.708s

Following are the results of the testing done with three servers in clustering:

1 server configured as dedicated Master Node and 2 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 13 mins (approx)

2 servers configured as dedicated Master Node and 1 as data Node, with workers as 8, batch size of 100000 and heapsize of 2GB required time 12.41 mins (approx)

All 3 servers configured as dedicated Master Node with workers as 16, batch size of 10000 and heapsize of 4GB required time 15.38 mins (approx)

With the above test cases I am not able to achieve the desirable time reduction for forwarding the logs to Elasticsearch, can someone please let me know what additional changes in the configuration needs to be done to achieve the same.

warkolm · April 17, 2017, 7:44am

Can you update your post to use code formatting where appropriate? It's really hard to read as is

ultimate.duwal · April 20, 2017, 10:32am

One thing that you can do is disable the refresh interval, refresh_interval: -1 in elasticsearch.yml
and re-enable again after all the data is copied to Elasticsearch.

warkolm · April 21, 2017, 12:13am

What's the desirable time?

Don't set that in the config file, use the APIs.

sayalee · April 24, 2017, 5:39am

For single server, using refresh interval of 60 sec (Case 2.1) its taking 16 min (approx).So with three servers I am expecting the time to be around 8-10 mins.

I have tried by changing the settings of the particular index from Devtools using curl command, can you please suggest some other changes which can give me the desirable output.

Christian_Dahlqvist · April 24, 2017, 6:28am

Once data is reliably being written to Elasticsearch I would recommend removing the stdout output plugin with the debug output. Have you monitored the different nodes during indexing to identify what the bottleneck is? Is it Elasticsearch or Logstash?

You have also not provided the filter section of your Logstash config, so it is hard for us to tell ifs there are optimisations to be made there.

ultimate.duwal · April 24, 2017, 10:15am

Thanks @warkolm.

sayalee · April 26, 2017, 11:29am

During indexing logstash was taking approx 125% -150% of cpu and 4% - 5 % of memory, whereas elasticsearch was taking 50% of memory (approx) and 30% - 40% of cpu.

I have not applied any filter in the configuration file, below is my conf file:

input {
stdin { }
}
filter {}
output {

stdout { codec => rubydebug }
elasticsearch {
hosts => [ "x.x.x.x", "y.y.y.y", "z.z.z.z" ]
index => "testing"
}
}

Can you please help me find the exact issue as I am struggling through it since long time.

system · May 24, 2017, 11:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deploy Cluster ES with multiple node? Elasticsearch	10	3374	July 5, 2017
Elasticsearch 7.11 with 3 nodes Elasticsearch	7	773	April 9, 2021
Cluster configuration for log storage. 140Gb/day Elasticsearch	5	3378	November 10, 2017
Deploying a 3 nodes cluster Elasticsearch	4	1703	July 5, 2017
Elk cluster plan with 7000EPS an 100/s search Elasticsearch	2	356	July 6, 2017

Clustering with three nodes

Related topics