OK, thanks for the advice.
On Mon, Mar 5, 2012 at 1:31 PM, Shay Banon kimchy@gmail.com wrote:
Note that the merge factor parameter does not apply to the default tiered
merge policy. In any case, setting it to 1 is not recommended, since you
can always control the number of shards it will optimize down to in the
optimize call API.
On Monday, March 5, 2012 at 6:45 PM, Craig Brown wrote:
We're running on AWS, 4 C1-XL nodes - 7GB ram, 20 compute units (8 virtual
cores). We allocate 4GB ram to ES. Each node as 1-500GB EBS instance for
storage. We run 26 shards with 0 replicas when indexing. It's MUCH faster
to index with 0 replicas if you can, then up the replica number after
indexing, than it is to index with 1 or more replicas. We set
refresh_interval to 30s and merge.policy.merge_factor to 30. After
indexing, we set them back to 1s and 1 and run optimize. This really helps.
Our documents are about 2k-5k in size and we index about 10k-12k docs/sec
initially. After 240m docs, we're in the 5k-6k docs/sec range. We wrote our
own multi-threaded indexing tool to do the work. We enable _source and
compression on _source. We still have _all enabled though we are not using
it. We'll disable that in the next round.
On Mon, Mar 5, 2012 at 9:11 AM, haarts harmaarts@gmail.com wrote:
Thanks a lot for the insight! I'd better convince by boss to buy 16 disk
machines.
On Monday, 5 March 2012 17:03:38 UTC+1, Thomas Peuss wrote:
Hi!
Am Montag, 5. März 2012 15:21:11 UTC+1 schrieb haarts:
Those are some impressive numbers. Would you mind sharing on what kind of
machines you are running? We are struggling indexing 500M documents,
reaching 1000+ inserts per second on a 3 node cluster (8 core i7 24GB, 1
simple spinner). Performance indexing is acceptable. But first time query
performance isn't great (seconds...).
We are running a 8-node cluster in two datacenters (4 nodes per DC). Each
machine has 24 cores, 32GB RAM and 8 disks (extendable to 16 disks) running
RHEL 6.1. The machines are not dedicated to ES alone (we use 50% of the
cores for number crunching without I/O involved). Currently we are running
with 16 shards and 1 replica.
We are currently peaking at 400 docs/s but the numbers are rising...
You should try to insert with many threads in parallel (we use 16).
Important here is that you wait for the response from ES because otherwise
you will overload ES.
CU
Thomas
--
…
CRAIG BROWN
chief architect
youwho, Inc.
www.youwho.com http://www.youwho.com/
T: 801.855. 0921
M: 801.913. 0939
--
…
CRAIG BROWN
chief architect
youwho, Inc.
www.youwho.com http://www.youwho.com/
T: 801.855. 0921
M: 801.913. 0939