ElasticSearch bulk api performance

allwefantasy · July 8, 2011, 6:10am

here is my code using bulk api:
https://gist.github.com/1071233

single node (node info and performance log)
https://gist.github.com/1071237

i find it really really slow. 20 documents per second! if i start two nodes in different machine,
it becomes only 9 documents per second.
anyone know why?

imarcticblue · July 8, 2011, 5:15pm

How large are your documents and how many are you indexing at a time? We're not using a DataItem, just raw JSON and we can index 40M records in about 3:30 at 3200 docs/sec. Average record size is .5k. This is on AWS with 2 large nodes, 32 shards, 2 replicas. Your hardware looks beefier than what we have on AWS.

Craig

jrawlings · July 8, 2011, 9:25pm

Are you indexing to a local ES node? How large are the DataItems?

From my experience doing bulk prepares to a non-local ES node, I
noticed I maxed out my network connection (10mb) quite quickly..

On Jul 7, 11:10 pm, allwefantasy allwefant...@gmail.com wrote:

here is my code using bulk api:https://gist.github.com/1071233https://gist.github.com/1071233

single node (node info and performance log)https://gist.github.com/1071237https://gist.github.com/1071237

i find it really really slow. 20 documents per second! if i start two nodes
in different machine,
it becomes only 9 documents per second.
anyone know why?

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

allwefantasy · July 9, 2011, 3:02pm

DataItem Class contains id and source fields. source is raw json data from blog article. 3200 docs/sec is really awesome! i still have no idea why it so slow in my application

allwefantasy · July 9, 2011, 3:08pm

yes.Index to a local node. DataItem contains one blog article. and each time I bulk index 1000 DataItems.

From: jrawlings [via Elasticsearch Users]
Sent: Saturday, July 09, 2011 5:25 AM
To: allwefantasy
Subject: Re: Elasticsearch bulk api performance

Are you indexing to a local ES node? How large are the DataItems?

From my experience doing bulk prepares to a non-local ES node, I
noticed I maxed out my network connection (10mb) quite quickly..

On Jul 7, 11:10 pm, allwefantasy <[hidden email]> wrote:

here is my code using bulk api:https://gist.github.com/1071233https://gist.github.com/1071233

single node (node info and performance log)https://gist.github.com/1071237https://gist.github.com/1071237

i find it really really slow. 20 documents per second! if i start two nodes
in different machine,
it becomes only 9 documents per second.
anyone know why?

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-api-performance-tp3150866p3153370.html
To unsubscribe from Elasticsearch bulk api performance, click here.

allwefantasy · July 9, 2011, 3:11pm

6 milliam blog articles will be indexed and whole index files are 62G .
1000 DataItems indexed at a time.

From: imarcticblue [via ElasticSearch Users]
Sent: Saturday, July 09, 2011 1:15 AM
To: allwefantasy
Subject: Re: ElasticSearch bulk api performance

How large are your documents and how many are you indexing at a time? We're not using a DataItem, just raw JSON and we can index 40M records in about 3:30 at 3200 docs/sec. Average record size is .5k. This is on AWS with 2 large nodes, 32 shards, 2 replicas. Your hardware looks beefier than what we have on AWS.

Craig

If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-api-performance-tp3150866p3152481.html
To unsubscribe from ElasticSearch bulk api performance, click here.

Craig_Brown · July 11, 2011, 11:38pm

So your docs are about 10K each? Are you doing any kind of other
transformation on your data? The code you showed is virtually the same
as mine, but I'm not using a DataItem. My data files are JSON, one per
row. I simple read each row, set the id and source, then use
client.prepareIndex(). We index 10,000 docs at a time. THe files
contain 20M docs and the files are compressed using gz. It's basically
just as fast to read from compressed files plus you get much smaller
files to push around

Craig

On Jul 9, 8:58 pm, allwefantasy allwefant...@gmail.com wrote:

6 milliam blog articles will be indexed and whole index files are 62G .
1000 DataItems indexed at a time.

From: imarcticblue [via Elasticsearch Users]
Sent: Saturday, July 09, 2011 1:15 AM
To: allwefantasy
Subject: Re: Elasticsearch bulk api performance

How large are your documents and how many are you indexing at a time? We're not using a DataItem, just raw JSON and we can index 40M records in about 3:30 at 3200 docs/sec. Average record size is .5k. This is on AWS with 2 large nodes, 32 shards, 2 replicas. Your hardware looks beefier than what we have on AWS.

Craig

If you reply to this email, your message will be added to the discussion below:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
To unsubscribe from Elasticsearch bulk api performance, click here.

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Topic		Replies	Views
Bulk index is so faster with single data node! Elasticsearch	2	510	January 4, 2019
Elasticsearch poor indexing performance Elasticsearch	6	848	December 1, 2017
Horizontal scaling of indexing Elasticsearch	8	1996	July 5, 2017
Slow bulk indexing Elasticsearch	4	2080	July 5, 2017
Java bulk API slows down if client is not closed and reopened Elasticsearch	9	520	July 6, 2017

ElasticSearch bulk api performance

Related topics