Just Pushed: Bulk API


(Shay Banon) #1

Hi,

Just pushed support for bulk operations, see more here:
http://github.com/elasticsearch/elasticsearch/issues/issue/371(pay
attention to the body format). I have been slowly enabling
elasticsearch over the past few versions to support it, and finally came up
with a performant REST format that I am happy with.

Some very initial numbers. Using the Java API, on a two nodes cluster with
a single index with 2 shards with replica all on local machine. Single
threaded client using the single index API gets around 3k docs per second.
Using the bulk API with 100 batch size, it fires up to 22k docs per second
:wink: (single thread) Would love to hear numbers from users on this.

-shay.banon


#2

Nice!

On 15 Sep 2010, at 11:27, Shay Banon shay.banon@elasticsearch.com wrote:

Hi,

Just pushed support for bulk operations, see more here: http://github.com/elasticsearch/elasticsearch/issues/issue/371 (pay attention to the body format). I have been slowly enabling elasticsearch over the past few versions to support it, and finally came up with a performant REST format that I am happy with.

Some very initial numbers. Using the Java API, on a two nodes cluster with a single index with 2 shards with replica all on local machine. Single threaded client using the single index API gets around 3k docs per second. Using the bulk API with 100 batch size, it fires up to 22k docs per second :wink: (single thread) Would love to hear numbers from users on this.

-shay.banon


(jamster) #3

exciting... will try and get some sample runs in asap. will report back
with numbers.

On Wed, Sep 15, 2010 at 6:27 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

Just pushed support for bulk operations, see more here:
http://github.com/elasticsearch/elasticsearch/issues/issue/371(pay attention to the body format). I have been slowly enabling
elasticsearch over the past few versions to support it, and finally came up
with a performant REST format that I am happy with.

Some very initial numbers. Using the Java API, on a two nodes cluster
with a single index with 2 shards with replica all on local machine.
Single threaded client using the single index API gets around 3k docs per
second. Using the bulk API with 100 batch size, it fires up to 22k docs per
second :wink: (single thread) Would love to hear numbers from users on this.

-shay.banon


(Lukáš Vlček) #4

Hi,

I suppose the difference between index and create actions is that create
action is index operation with op_type parameter set to create (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/#Operation_Type)
is that correct?
As for the REST endpoint, the example of URL would be
http://localhost:9200/_bulk right?

Lukas

On Wed, Sep 15, 2010 at 12:27 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Hi,

Just pushed support for bulk operations, see more here:
http://github.com/elasticsearch/elasticsearch/issues/issue/371(pay attention to the body format). I have been slowly enabling
elasticsearch over the past few versions to support it, and finally came up
with a performant REST format that I am happy with.

Some very initial numbers. Using the Java API, on a two nodes cluster
with a single index with 2 shards with replica all on local machine.
Single threaded client using the single index API gets around 3k docs per
second. Using the bulk API with 100 batch size, it fires up to 22k docs per
second :wink: (single thread) Would love to hear numbers from users on this.

-shay.banon


(Shay Banon) #5

Yes on both.

On Wed, Sep 22, 2010 at 12:02 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

I suppose the difference between index and create actions is that create
action is index operation with op_type parameter set to create (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/#Operation_Type)
is that correct?
As for the REST endpoint, the example of URL would be
http://localhost:9200/_bulk right?

Lukas

On Wed, Sep 15, 2010 at 12:27 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Hi,

Just pushed support for bulk operations, see more here:
http://github.com/elasticsearch/elasticsearch/issues/issue/371(pay attention to the body format). I have been slowly enabling
elasticsearch over the past few versions to support it, and finally came up
with a performant REST format that I am happy with.

Some very initial numbers. Using the Java API, on a two nodes cluster
with a single index with 2 shards with replica all on local machine.
Single threaded client using the single index API gets around 3k docs per
second. Using the bulk API with 100 batch size, it fires up to 22k docs per
second :wink: (single thread) Would love to hear numbers from users on this.

-shay.banon


(system) #6