I also had a similar requirement. I dunno if this solution will work
for you. You can try an alternate approach.
Instead of indexing the documents directly, queue them to a message
queue. (like rabbitmq).
Have consumers which will keep reading from the queue and index the
document into elasticsearch.
This way, by de-coupling your document generation and document
indexing, you need not worry about the rate at which your documents
are being created.
Also, since your documents seem to be small, this will not be much of
an overhead on messaging systems.
If you use a framework like celery, this is done very transparently
for you. You don't have to understand (deeply) about AMQP and similar
Assuming that you are doing this on a cloud setup, you may already
have access to a RabbitMQ setup.
On Wed, Aug 25, 2010 at 12:32 AM, elasticsearcher
I've searched around on the docs, and I haven't found a solution, so I
thought I'd ask here.
In my program, I generate many short documents to index very quickly (shall
we say, 1000 every few seconds, per thread, and I have many threads on many
nodes), and then insert them into ElasticSearch for indexing one-by-one
until they're gone. I believe this may be a bottleneck in my system.
Is there any way to index a large batch of documents at once (all of the
I am currently using the REST API via python, but if this feature exists in
a different API instead, it is conceivable that I could incorporate it into
My document type looks like:
I'm imagining the slowdown is simply because I have to push thousands of
documents to the cloud, one-by-one, even though I have large chunks of them
generated at once, and the overhead of individual transfers/indexing is the
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Indexing-multiple-things-at-once-Possible-tp1317722p1317722.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.