Have you tried the bulk indexing API?
Elasticsearch Platform — Find real-time answers at scale | Elastic
I'm not entirely familiar with PyES, but I think it implements the bulk API
too: Bulk load ElasticSearch using pyes | dave dash
Also try some of the tips Shay recommends here:
https://groups.google.com/d/msg/elasticsearch/APWxRLrMOeU/HxZEyY0Yx_sJ
-Zach
On Monday, January 14, 2013 10:54:57 AM UTC-5, Omega Mike wrote:
I am currently testing ES as a replacement for MongoDB in a custom
centralized logging mechanism. Using Mongo, I am able to throughput entries
into the current instance at a rate of 500-800/second on average with peaks
of 1200/second. These are one-line log entries, broken down into JSON
objects by a Python program and inserted into a dedicated Mongo collection
for each remote logging host. All of this being said, I haven't been able
to squeeze much more performance out of MongoDB, aside from throwing more
hardware behind it (which is slightly frowned upon at the moment).TL;DR
Basically, it seems as though ES will fit our purposes
more closely (especially in search performance). That being said, I have
setup an ES instance on the same hardware (MongoDB is shutdown for testing)
and while the search performance seems great, for what I've been able to
insert so far, the actual inserting or indexing performance is nowhere near
adequate. I'm currently only able to insert around 25 entries per second,
obviously nowhere near the performance of MongoDB.
I haven't been able to find any great information on tuning the
performance of inserts in ES at all, so if anyone could point me to those
that'd be awesome. Otherwise, as for my current setup, I'm using 0.20.2
(installed with the typical extract and splat method on CentOS 6), I'm
using the pyes Python library to interface with ES, the program inserting
is running locally on the same box, which is a VM with four cores @ 2.67GHz
and 4GB RAM. I'm not hitting any sort of disk limitation yet (which on the
back-end is hosted on the company SAN which has much of our production
environment) they way I have been with MongoDB.Thanks in advance for any help anyone might be able to offer me.
--