I have about 1500 processes, each issuing a bulk index request of 500
documents. These 1500 processes run in parallel (well, in batches of 600
or so), and they complete in just a couple of minutes. In essence, the Es
cluster is being hit with 1500 bulk index requests, each with 500
documents, over the course of a few minutes.
First issue is that not all the bulk index requests return a proper
response. This may be a client and/or proxy issue. I opened a ticket with
Tire: https://github.com/karmi/tire/issues/740
The second problem is that after issuing all these bulk requests, the Es
cluster becomes very slow respond to requests. For instance, doing a
search simply to query the total number of documents in the index, takes up
to 18 seconds.
My setup is a 2 node cluster: 30 shards, 0 replicas. Each node has 8
processors and 7.5 GB of memory. While the bulk indexing is going on, the
machines' cpu doesn't go above 60% utilization. The load doesn't go above
3. In other words, the machines look like they aren't even breaking a
sweat.
Why does the cluster become unresponsive (err, very slow) then?
They're asynchronous, cause I/O load and invalidate caches.
Warmers might help the cause:
Since you seem to have bursts of indexing, I would disable the warmers
doing that, and enable them back once indexing is done. Otherwise,
warmers will slow down your indexing.
On Sun, 26 May 2013 10:17:47 -0700 (PDT)
"Christopher J. Bottaro" cjbottaro@gmail.com wrote:
Hi,
I have about 1500 processes, each issuing a bulk index request of 500
documents. These 1500 processes run in parallel (well, in batches of
600 or so), and they complete in just a couple of minutes. In
essence, the Es cluster is being hit with 1500 bulk index requests,
each with 500 documents, over the course of a few minutes.
The second problem is that after issuing all these bulk requests, the
Es cluster becomes very slow respond to requests. For instance,
doing a search simply to query the total number of documents in the
index, takes up to 18 seconds.
My setup is a 2 node cluster: 30 shards, 0 replicas. Each node has
8 processors and 7.5 GB of memory. While the bulk indexing is going
on, the machines' cpu doesn't go above 60% utilization. The load
doesn't go above 3. In other words, the machines look like they
aren't even breaking a sweat.
Why does the cluster become unresponsive (err, very slow) then?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.