Thank you for the recommendations.
I am not saturating the network at all, I am CPU bound on the ES cluster
side during the load phase. These servers are dedicated only for ES and are
currently only responsible for the initial loading of the data.
Unfortunately I do not have the luxury of having solid state, but will look
at storage layer optimizations when that does become the bottleneck.
-Adi
On Wednesday, 13 November 2013 00:15:57 UTC-8, Jörg Prante wrote:
Please use the official python clients.
Monitor also the network interface if you index from a remote host.
You have 15b+ docs and I assume that are some GBs. If the network is
saturated and you can spend CPU cycles, use gzip compression on HTTP bulk.
If you do not feel like using the official client, check if httplib2 is a
better choice over urllib2, it supports compression.Also check if you use fast JSON encoding on client side. ujson is a fast
drop-in replacement for the slow standard json python lib.For fast persisting in the cluster, use SSD instead of spindle disks. On
file systems with spindle disk, you should disable the atime (noatime) in
the Linux file system mount on the data dir for better I/O throughput.Jörg
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.