we have some application servers and we would like to ship all their logs into ES.
Therefore I set up a ES 2.0.0 cluster of 8 nodes on 5 different virtual machines (we are using xen):
3 machines with each 1 data node and 1 master node
1 machine with logstash, rabbitmq and a client node
1 machine with kibana and a client node
vm with data and master node
- total of 16gb memory
- data node 12gb ES_HEAP
- master node 512M ES_HEAP
- total of 4 cores
- direkt attached raid array of spinning disks
vm with logstash
- queue: logstash with input codec for multiline ---> Rabbit MQ ---> up to 4 logstash instances for filtering with output to (ES Client on Logstash, ES DATA Node 1,2,3)
Our Application is distributed to 18 machines with ngninx/pound reverseproxys for load balancing.
Shipping and processing of the nginx and pound logs works gently. Indexsize is about 12GB per day and the number of docs is about 10mio. Indexrate is about 300-500/sec.
When we activate the shipping of the App-Server logs we run into performance issues. The needed indexing rate is about 1500-2000/sec. The documents are a bit larger (we have java stack traces in it for example). Indexsize per day grows up to 70GB.
Our monitoring tool is sending frequently requests to ES and ask for some interface states of the application. Therefore we analyse the last 5minutes in time.
The problem is, that the last 5minutes are not available when rabbitmq queue is rising due to bad indexing performance.
My Question is: Do I expect to much out of this setup, or is this typical performance of such a setup.
Thanks in advance!
Below you'll find our configuration of the data nodes:
======================== Elasticsearch Configuration DATA NODE =========================
network.host: IP of the vm
discovery.zen.ping.unicast.hosts: ["List of Our IP:Port combinations"]