ELK stack with poor performance and mysterious client node

We began using the ELK stack in our company as an emergency approach when we lose our appliance for log analysis months ago.

We do not still completely control ElasticSearch, so we are having difficulty in solving some problems, such as poor performance in your cluster.

We have two nodes running on two identical Dell T320 quad-core Xeon 2.2GHz, 8GB RAM, 1TB RAID 1 SATA disks, CentOS 7.1.

A node was elected as master and data storage (Hubble 1) and the other for data only (Hubble 2). Logstash and kibana running on same. Logstash and Kibana run on the master node.

On both nodes, we allocated 6GB for ElasticSearch JVM. We did tests with 4GB and 5GB on the master node, but we had a high consumption of memory with the JVM.

Suddenly, a third node appeared in the same machine with the master node. I do not know where it came, I can not remove it and do not even know if he is one of the causes of the cluster poor performance.

Another problem is that all searches on Kibana only occupy the master node, the second node appears never be triggered when monitored with the top command.

[root@hubble ~]# curl 'localhost:9200/_cat/health?v'

 epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks
1441287271  10:34:31     logs  green          3         2    802 401    0    0        0             0

[root@hubble ~]# curl 'localhost:9200/_cat/nodes?v'

host         ip          heap.percent ram.percent load node.role master name
hubble.cnen  10.10.4.201           75          79 1.73 d         *      Hubble 1
hubble.cnen  10.10.4.201           49                  c         -      logstash-hubble.cnen-1748-11694
hubble2.cnen 10.10.4.123           73          63 0.03 d         -      Hubble 2

[root@hubble ~]# curl 'localhost:9200/_cat/indices?v'

health status index               pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2015.08.04   5   1   15475985            0      9.3gb          4.6gb
green  open   logstash-2015.07.31   5   1   14165367            0      8.5gb          4.2gb
green  open   logstash-2015.09.02   5   1   18585154            0     10.8gb          5.4gb
green  open   logstash-2015.06.27   5   1    2541444            0      1.6gb          850mb
green  open   logstash-2015.08.15   5   1    2369095            0      1.7gb        918.7mb
green  open   logstash-2015.06.22   5   1     665854            0      238mb          119mb
green  open   logstash-2015.07.10   5   1   16204565            0      9.4gb          4.7gb
green  open   logstash-2015.07.26   5   1    1893079            0      1.4gb        724.2mb
green  open   logstash-2015.08.14   5   1   17434369            0     10.3gb          5.1gb
green  open   logstash-2015.09.01   5   1   18217711            0     10.6gb          5.3gb
green  open   .kibana               1   1         25            2    127.7kb         63.8kb
green  open   logstash-2015.06.29   5   1   19674347            0     11.3gb          5.6gb
green  open   logstash-2015.08.23   5   1    2070947            0      1.5gb        783.9mb
green  open   logstash-2015.06.28   5   1    2301173            0      1.5gb        772.1mb
green  open   logstash-2015.07.14   5   1   17208419            0     10.1gb            5gb
green  open   logstash-2015.08.24   5   1   19739441            0     11.4gb          5.7gb
green  open   logstash-2015.07.16   5   1   16974541            0       10gb            5gb
green  open   logstash-2015.08.02   5   1    1774571            0      1.3gb          676mb
green  open   logstash-2015.08.22   5   1    2213696            0      1.6gb        852.5mb
green  open   logstash-2015.08.16   5   1    1930990            0      1.4gb        722.9mb
green  open   logstash-2015.07.24   5   1   16207758            0      9.5gb          4.7gb
green  open   logstash-2015.06.18   5   1    1951673            0    798.9mb        399.4mb
green  open   logstash-2015.07.19   5   1    2047045            0      1.4gb          765mb
green  open   logstash-2015.08.05   5   1   18347586            0     10.8gb          5.4gb
green  open   logstash-2015.07.07   5   1   10806601            0      6.7gb          3.3gb
green  open   logstash-2015.07.03   5   1   17902529            0     10.4gb          5.2gb
green  open   logstash-2015.08.07   5   1   17170460            0     10.1gb            5gb
green  open   logstash-2015.07.09   5   1   15715564            0      8.7gb          4.3gb
green  open   logstash-2015.09.03   5   1    6467647            0      4.1gb            2gb

*We have more fifty two lines of indices, but we removed it because they exceeded the limit of 5000 characters of the post.

Some noble soul could tell us where we are going wrong?
:sweat_smile:

Suddenly, a third node appeared in the same machine with the master node. I do not know where it came, I can not remove it and do not even know if he is one of the causes of the cluster poor performance.

The third node is Logstash if you're using the node protocol (which you apparently are). That's not why you're having poor performance.

Another problem is that all searches on Kibana only occupy the master node, the second node appears never be triggered when monitored with the top command.

Presumably Kibana is configured to talk to ES on localhost, in which case that ES node will get a higher load. How much higher depends on what kind of queries are made, what timespan you're querying and how the shards are distributed across the machines.

There's a fixed overhead on each shard and five shards a day is probably too much with just a couple of nodes.

I thik we have a situation with configuration from two nodes.

At that moment I tried to check the health of the cluster and I waited nearly five minutes for the result. See the result:

[root@hubble ~]# curl 'localhost:9200/_cat/health?v'

epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks
1441312317 17:31:57  logs    yellow          2         1    401 401    0    0      401             0

Half of the shards became unassign.

I tried to check the health of the second node and after a few minutes...

[root@hubble2 ~]# curl 'localhost:9200/_cat/health?v'

{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}