Improve indexing throughput

Craig_Brown · March 6, 2012, 5:02pm

When you connect to the cluster this way, your client becomes a node in the
cluster and has access to all of the routing and other information. The
client is therefore much more efficient because it can directly communicate
with the node that has to do the work.

Craig

On Tue, Mar 6, 2012 at 12:52 AM, Lior Cohen barnea <
liorcohenbarnea@gmail.com> wrote:

Thanks for all the answers.

It seem that when i used a client node
Node node = nodeBuilder().client(true).node(); Client client = node.client()
my indexing time was much faster than when i used transport client to
a local node ...

should it be like that ?

On Mar 5, 10:31 pm, Shay Banon kim...@gmail.com wrote:

Note that the merge factor parameter does not apply to the default
tiered merge policy. In any case, setting it to 1 is not recommended, since
you can always control the number of shards it will optimize down to in the
optimize call API.

On Monday, March 5, 2012 at 6:45 PM, Craig Brown wrote:

We're running on AWS, 4 C1-XL nodes - 7GB ram, 20 compute units (8
virtual cores). We allocate 4GB ram to ES. Each node as 1-500GB EBS
instance for storage. We run 26 shards with 0 replicas when indexing. It's
MUCH faster to index with 0 replicas if you can, then up the replica number
after indexing, than it is to index with 1 or more replicas. We set
refresh_interval to 30s and merge.policy.merge_factor to 30. After
indexing, we set them back to 1s and 1 and run optimize. This really helps.
Our documents are about 2k-5k in size and we index about 10k-12k
docs/sec initially. After 240m docs, we're in the 5k-6k docs/sec range. We
wrote our own multi-threaded indexing tool to do the work. We enable
_source and compression on _source. We still have _all enabled though we
are not using it. We'll disable that in the next round.

Craig

On Mon, Mar 5, 2012 at 9:11 AM, haarts <harmaa...@gmail.com (mailto:
harmaa...@gmail.com)> wrote:

Thanks a lot for the insight! I'd better convince by boss to buy 16
disk machines.

On Monday, 5 March 2012 17:03:38 UTC+1, Thomas Peuss wrote:

Hi!

Am Montag, 5. März 2012 15:21:11 UTC+1 schrieb haarts:

Those are some impressive numbers. Would you mind sharing on
what kind of machines you are running? We are struggling indexing 500M
documents, reaching 1000+ inserts per second on a 3 node cluster (8 core i7
24GB, 1 simple spinner). Performance indexing is acceptable. But first time
query performance isn't great (seconds...).

We are running a 8-node cluster in two datacenters (4 nodes per
DC). Each machine has 24 cores, 32GB RAM and 8 disks (extendable to 16
disks) running RHEL 6.1. The machines are not dedicated to ES alone (we use
50% of the cores for number crunching without I/O involved). Currently we
are running with 16 shards and 1 replica.

We are currently peaking at 400 docs/s but the numbers are
rising...

You should try to insert with many threads in parallel (we use
16). Important here is that you wait for the response from ES because
otherwise you will overload ES.

CU
Thomas

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com(http://www.youwho.com/)

T: 801.855. 0921
M: 801.913. 0939

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com http://www.youwho.com/

T: 801.855. 0921
M: 801.913. 0939

Topic		Replies	Views
Very slow ElasticSearch Index Elasticsearch	8	409	July 6, 2017
Slow Indexing Speed Elasticsearch	5	7254	July 6, 2017
Issue Indexing 50mil Docs via Bulk API Elasticsearch	23	2496	July 5, 2017
Inserts get slower when index become large Elasticsearch	10	489	July 6, 2017
Heavy indexing cause severe delay for searching Elasticsearch	12	540	July 6, 2017

Improve indexing throughput

Related topics