Poor Update Performance Despite Refresh Interval Compromise

Running a 2-node cluster, we're experiencing less than ideal update times
even after adjusting the refresh interval.

Settings are:
"number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s"

The two VMs are 4 cores, 7 GB of ram, and the following are response times
reported (on avg - over a 2-3 month duration):

Imported 2475 documents in 7107 milliseconds
Imported 2475 documents in 4862 milliseconds
Imported 2475 documents in 6015 milliseconds
Imported 2475 documents in 5991 milliseconds

My understanding of the reported times (using Elasticsearch.NET's
IBulkRequest Took) is that they don't involve the network delays associated
with getting the request to ES, just the server processing time:
https://github.com/elasticsearch/elasticsearch-net/issues/453

Are these times considered below/average/above given your experience? Can
anything else be done to improve indexing performance here?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44863f41-1eac-4294-a710-c00d7ea8e1bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Wed, Apr 23, 2014 at 10:01 AM, Nariman Haghighi auspicious@gmail.comwrote:

Running a 2-node cluster, we're experiencing less than ideal update times

even after adjusting the refresh interval.

Settings are:
"number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s"

The two VMs are 4 cores, 7 GB of ram, and the following are response times
reported (on avg - over a 2-3 month duration):

Imported 2475 documents in 7107 milliseconds
Imported 2475 documents in 4862 milliseconds
Imported 2475 documents in 6015 milliseconds
Imported 2475 documents in 5991 milliseconds

My understanding of the reported times (using Elasticsearch.NET's
IBulkRequest Took) is that they don't involve the network delays associated
with getting the request to ES, just the server processing time:
Support for GZIP on PUT/POST · Issue #453 · elastic/elasticsearch-net · GitHub

Are these times considered below/average/above given your experience? Can
anything else be done to improve indexing performance here?

It really depends on the size of the documents and how cpu heavy the
analysis is. I feel pretty good when I can get 5,000 per second across 16
severs with 96GB of ram and 12 (couple year old) cpus. But my documents
are generally a couple hundred KB and range up into tens of MB. OTOH, I
can't overwhelm the servers because they are still performing searching
during this time so I try to keep the bump in cpu load due to this kind of
bulk indexing around 25%.

The standard advice is to shut off the refresh interval during bulk loads
if you can get away with it and make sure you are doing them across many
threads.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3ya0zWwSaFk5%2B4Az4BuUqYyqVht-0RhaQn303z23Or%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.