Indexing multiple things at once. Possible?

If you use the async (non blocking) interface, you can index really fast
even if you're sending the docs one by one, a batch process is not really
needed.
If you'll have 5 servers, I'd guess that 3-4K documents would not be an
issue, ES would easily keep up with that. We're able to index 1000 docs with
100 fields each, on a single 4 core CPU PC.

You can watch cpu/io to throttle if necessary. Also, you may want to use
blocking threadpool.
http://www.elasticsearch.com/docs/elasticsearch/modules/threadpool/blocking/

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Sep 1, 2010 at 2:05 PM, elastic searcher
elasticsearcher@gmail.comwrote:

In essence, I have a large number of documents which are generated in
Large quantities very very quickly, and I'm looking for the way to
index them as fast as possible. I was wondering if there were a way to
index, say, a batch of documents more quickly than indexing each
document individually.

If this isn't something feasible, would it be possible for me to split
off a few threads to help send indexing requests to elasticsearch more
quickly? My question is really aimed at understanding how
elasticsearch deals with indexing requests. Would it be advantageous
to issue as many indexing requests as possible, or would elasticsearch
start to get overloaded? (Or, at what point would elasticsearch get
overloaded?)

For instance, my cluster is currently running on five regular old pc's
(servers coming soon), each with 2GB ram (1GB allocated to ES), dual
core intel cpu, etc. My program, running on each node, will be
generating lists of, shall we say for simplicity, 1000 documents,
essentially as fast as they can. After generating the 1000 documents,
they currently sit there and submit the documents one-by-one to
elasticsearch for indexing until the documents are all gone. They then
generate a new 1000 documents and repeat the process. My program
already has multiple threads which could all be generating sets of
1000 documents at once, maybe 3000-4000 documents queued up at any
time, on each node.

Since I'm not very familiar with how ES actually does the indexing,
I'm really just looking for advice on how to get my large number of
documents indexed as quickly as possible.

On Aug 27, 3:52 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Its important to understand where the bottleneck is. When you say index
documents "into" the cloud, what do you mean? Is that a WAN call?

On Tue, Aug 24, 2010 at 10:02 PM, elasticsearcher <
elasticsearc...@gmail.com

wrote:

I've searched around on the docs, and I haven't found a solution, so I
thought I'd ask here.

In my program, I generate many short documents to index very quickly
(shall
we say, 1000 every few seconds, per thread, and I have many threads on
many
nodes), and then insert them into Elasticsearch for indexing one-by-one
until they're gone. I believe this may be a bottleneck in my system.

Is there any way to index a large batch of documents at once (all of
the
same type)?

I am currently using the REST API via python, but if this feature
exists in
a different API instead, it is conceivable that I could incorporate it
into
my program.

My document type looks like:

{
Name1:
Name2:
Percent:
}

I'm imagining the slowdown is simply because I have to push thousands
of
documents to the cloud, one-by-one, even though I have large chunks of
them
generated at once, and the overhead of individual transfers/indexing is
the
bottleneck.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Indexing-multiple-thi.
..
Sent from the Elasticsearch Users mailing list archive at Nabble.com.