Dear Folks:
I am running some scaling tests on elastic search. We are considering using
elastic search to store a large volume of log file data. A key metric for
us is the write throughput. We've been trying to understand how
Elasticsearch can scale. We were hoping to see some form of linear scaling.
That is, add in new nodes, increase the shard count, etc, in order to be
able to handle an increased number of writes. No matter what lever we pull,
we're not seeing an increase in write throughput. Clearly we're missing
something.
We have tested several different clusters on AWS (details below). We have
doubled the number of nodes; we have doubled (and doubled again) the number
of shards. We have doubled the number of servers feeding the cluster log
file data. We have put the nodes in the cluster behind an AWS load
balancer. We have increased the number of open files allowed by ubuntu.
None of these changes has had any effect on the number of records indexed
per second.
Details:
We are piping json data using fluentd, using the fluentd elastic search
plugin, storing the log file entries using the logstash format, which is
supported by the fluentd elastic search plugin. Note that we are not using
logstash as a data pipeline.
We configured the cluster with only 1 replica, the default.
We increased the ulimit for open files to 32000
Elastic search configuration:
discovery.type: ec2
cloud:
aws:
region: us-west-2
groups: elastic-search
cloud.node.auto_attributes: true
index.number_of_shards: 40 # we varied this between 10-40
index.refresh_interval: 60s
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
Here's how we setup the memory configuration in setup.sh
#!/bin/sh
export ES_MIN_MEM=16g
export ES_MAX_MEM=30g
bin/elasticsearch -f
Each node in the cluster is m1.xlarge, which has the following
characteristics:
Instance Family Instance Type Processor Arch vCPU ECU Memory
(GiB) Instance Storage (GB) EBS-optimized Available Network
Performance General purpose m1.xlarge 64-bit 4 8 15 4 x 420 Yes High
We've monitored the cluster using http://www.elastichq.org. For each index,
we calculated (Indexing Total)/(Indexing Time) to get the number of records
indexed per second. Whether we have a single node with 10 shards or 6 nodes
with 40 shards, we consistently see indexing occurring at a rate of around
3000-3500 records/second. It is this measure that we never seem to be able
to increase.
Here's some characteristic stats for an individual node:
Heap Used: 1.1gb
Heap Committed: 7.9gb
Non Heap Used: 42.3mb
Non Heap Committed: 66.2mb
JVM Uptime: 4h
Thread Count/Peak: 62 / 82
GC Count: 8156
GC Time: 4.8m
Java Version: 1.7.0_45
JVM Vendor: Oracle Corporation
JVM: Java HotSpot(TM) 64-Bit Server VM
We have not seen memory contention, or high CPU utilization, in general
(CPU utilization is around 80% out of 400% possible). We're not doing any
reading of the database (that would be a subsequent test).
Here's some more stats:
Open File Descriptors:795CPU Usage:68% of 400%CPU System:1.5mCPU User:26.6mCPU
Total:28.2mResident Memory:8.4gbShared Memory:19.6mbTotal Virtual Memory:
10.3gb
Thanks for any help.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cbf8e9d-6e14-4154-8f95-aa28e1dc5934%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.