Hi, I'm on elasticsearch master with two nodes, one index or 5 shards and 1
replica. I'm rivering in a couchdb database of about 3GB of 400000
documents. It's going very slowly, something like 50 documents a minute. I
can see on the rivering node that CPU usage is maxed out so its working
pretty hard.
Hi, I'm on elasticsearch master with two nodes, one index or 5 shards and 1
replica. I'm rivering in a couchdb database of about 3GB of 400000
documents. It's going very slowly, something like 50 documents a minute. I
can see on the rivering node that CPU usage is maxed out so its working
pretty hard.
I just tried increasing to 1s to no avail. The documents vary from about 1k
to 150k, with 100 distinct attributes overall, but each document only
including a small number of these, about 10. I'm going to try a 30s
timeout now to see it that makes a difference, and will play around with the
bulk_size. Are there any other settings that I can twiddle?
On Thu, 2011-08-25 at 02:27 -0700, Harry Waye wrote:
I just tried increasing to 1s to no avail. The documents vary from
about 1k to 150k, with 100 distinct attributes overall, but each
document only including a small number of these, about 10. I'm going
to try a 30s timeout now to see it that makes a difference, and will
play around with the bulk_size. Are there any other settings that I
can twiddle?
How much memory have you allocated to ES? And have you made sure that
swap is disabled, either by turning swapoff completely, or by using
mlockall?
Swap is your enemy - as soon as any part of the heap is in swap, the JVM
will grind to a halt.
We've noticed that the river is pulling in _attachments as well, is it meant to be doing that?
On 25 August 2011 15:35, Harry Waye harry@arachnys.com wrote:
No, just on the rivering node as I recall. I've disbanded the the group so can't verify easily. I'll have to arrange a reunion later to test.
On 25 August 2011 15:29, Shay Banon kimchy@gmail.com wrote:
Wondering, do you see heavy CPU load also on the other node?
On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye hwaye@microwayes.net wrote:
No change, still very slow
Not that it pulls in attachments, just attachment metadata list
content_type, length etc. Each attachment is assigned a hash so you end up
with many many fields, several for each attachment. I don't think you
can suppress the field, perhaps theres some was of using a view but we're
just removing it elasticsearch side for now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.