Bulk Processor question


(IronMike) #1

I don't quite undertsand what the bulk processor is doing, I would like
someone to explain how it is supposed to work to make sure I designed this
correctly.
I specify the number of actions 1000.
My feeder keeps pushing documents to it "Its more like a loop iterating
documents folders" where I push each document to the bulk. I expected the
bulk to queue things until it reaches 1000 docs? Then process the bulk?

Yet, this is how it logs, this comes from the call back functions of the
bulk processor.

Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.....

Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.....

Bulk Called: ID= 47, Actions=17, MB=5.245771 ...

Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

BulkProcessor has two thresholds, the number of actions (as you use by
setting it to 1000) or a bulk request byte volume (default 5M). What you
see is the 5M limit kicking in, your docs are quite large.

Jörg

On Wed, Mar 12, 2014 at 8:54 PM, ZenMaster80 sabdalla80@gmail.com wrote:

I don't quite undertsand what the bulk processor is doing, I would like
someone to explain how it is supposed to work to make sure I designed this
correctly.
I specify the number of actions 1000.
My feeder keeps pushing documents to it "Its more like a loop iterating
documents folders" where I push each document to the bulk. I expected the
bulk to queue things until it reaches 1000 docs? Then process the bulk?

Yet, this is how it logs, this comes from the call back functions of the
bulk processor.

Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.....

Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.....

Bulk Called: ID= 47, Actions=17, MB=5.245771 ...

Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEg11MxwGanPVVJpvjd1py1D2PkwkBGjC05YKzh_A1P%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(IronMike) #3

My docs vary in size. some a very small, some are pdfs like showing in the
log there, how do you suggest I do this since I don't know when the docs
will be small or large?

On Wednesday, March 12, 2014 4:01:53 PM UTC-4, Jörg Prante wrote:

BulkProcessor has two thresholds, the number of actions (as you use by
setting it to 1000) or a bulk request byte volume (default 5M). What you
see is the 5M limit kicking in, your docs are quite large.

Jörg

On Wed, Mar 12, 2014 at 8:54 PM, ZenMaster80 <sabda...@gmail.com<javascript:>

wrote:

I don't quite undertsand what the bulk processor is doing, I would like
someone to explain how it is supposed to work to make sure I designed this
correctly.
I specify the number of actions 1000.
My feeder keeps pushing documents to it "Its more like a loop iterating
documents folders" where I push each document to the bulk. I expected the
bulk to queue things until it reaches 1000 docs? Then process the bulk?

Yet, this is how it logs, this comes from the call back functions of the
bulk processor.

Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.....

Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.....

Bulk Called: ID= 47, Actions=17, MB=5.245771 ...

Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23cffa8e-ba3f-49f7-9b23-5b4fdd47b054%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #4

If you want to bulk by action count only, set the bulk size threshold to
-1. Here is an example (check bulkIndexByActions method):

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/681ba366-4ebd-41a8-a524-847412ed6bf4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5