Performance issues on EC2/EBS

Hello,

We are not getting the desired performance out of our Elasticsearch
cluster. Here is the setup:

  • 4 nodes
  • each node is an EC2 m1.large, 2 cpu, 7.5 gb memory
  • data on EBS volumes
  • 1 index, 2,956,699 documents, 30 shards, 0 replicas
  • HAProxy round robin to each node, 1 second connection timeout, 5
    second response timeout
  • A document consists of 15 to 50 fields
  • Fields are mostly not analyzed strings, longs, floats and a few dates.
    We have no analyzed strings at all.
  • ES_HEAP_SIZE = 5gb (each node has total of 7.5 gb)
  • bootstrap.mlockall: true
  • indices.memory.index_buffer_size: 50%
  • index.refresh_interval: 30
  • index.translog.flush_threshold_ops: 50000

The profile of our work is many small jobs of:

  1. Retrieve document (not search)
  2. Change a field
  3. Index document

When HAProxy reports a session rate of about 100 (which is to be read as
100 requests/sec, I think), we start getting connection timeouts and
response timeouts.

100 indexes/sec seems pretty low, even in spite of our very modest hardware.

We keep trying new things (adding more nodes, tuning, etc) and our
next experiment is to take the data off the EBS volumes. It takes a lot of
time and effort to try these experiments, so I was hoping to post our setup
here and maybe get a push in the right direction, rather than stumbling
around blindly.

Thanks for the help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Christopher J. Bottaro wrote:

We are not getting the desired performance out of our
Elasticsearch
cluster. Here is the setup:

  • 4 nodes
  • each node is an EC2 m1.large, 2 cpu, 7.5 gb memory
  • data on EBS volumes
  • 1 index, 2,956,699 documents, 30 shards, 0 replicas
  • HAProxy round robin to each node, 1 second connection timeout,
    5
    second response timeout

[...]

  • index.refresh_interval: 30

You probably mean 30s. Up this to minutes or even disable it (-1)
if you can afford the delay in docs showing up in search.

  • index.translog.flush_threshold_ops: 50000

Don't set this; the default should be fine.

When HAProxy reports a session rate of about 100 (which is to be
read
as 100 requests/sec, I think), we start getting connection
timeouts
and response timeouts.

This is the rate of new connections through the proxy. You want
this number to be as low as possible, around a handful per data
node. Make sure your HTTP client is using keep-alives.

EBS likely isn't your bottleneck here. You probably have some
client issues you could work out on a laptop first and avoid the
overhead of the deployment loop.

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.