All IndexRequestBuilders in same thread use the same source array?


(jacorob) #1

I'm trying to process a number of IndexRequests through the Bulk API
in 0.15.2 through the java api. To do so I have a loop that first
creates all the IndexRequestBuilders and stores them in a list. At a
later time I then iterate through the list and add each
indexrequestbuilder to particular bulk request. The documents I'm
indexing will all generate the exact same size source, but the content
of the source will be different (different constant size guids, etc).

Each individual IndexRequestBuilders is generated correctly. However,
setting the source on a new IndexRequestBuilder changes the source on
EVERY IndexRequestBuilder that was created before it. Looking at the
IndexRequestBuilders instances, all the builders use the exact same
source instance (a byte array).

Is this a bug or as designed? If as designed, why? Am I mistaken in
that this limits the number of IndexRequestBuilders that you can have
instantiated at one time to one per thread?

Thanks,
Bob


(Shay Banon) #2

This is by design in order to reduce array copies when using single index requests. When you add an index request builder to a bulk request, it gets copied over. What you need to do is keep the bulk request around and keep adding to it, instead of keeping the index builders around.
On Wednesday, April 6, 2011 at 2:03 AM, Bob wrote:

I'm trying to process a number of IndexRequests through the Bulk API
in 0.15.2 through the java api. To do so I have a loop that first
creates all the IndexRequestBuilders and stores them in a list. At a
later time I then iterate through the list and add each
indexrequestbuilder to particular bulk request. The documents I'm
indexing will all generate the exact same size source, but the content
of the source will be different (different constant size guids, etc).

Each individual IndexRequestBuilders is generated correctly. However,
setting the source on a new IndexRequestBuilder changes the source on
EVERY IndexRequestBuilder that was created before it. Looking at the
IndexRequestBuilders instances, all the builders use the exact same
source instance (a byte array).

Is this a bug or as designed? If as designed, why? Am I mistaken in
that this limits the number of IndexRequestBuilders that you can have
instantiated at one time to one per thread?

Thanks,
Bob


(jacorob) #3

Thanks, Shay. As soon as I figured out what was going on I started passing
my bulk request around everywhere. Fixed things right up.

If you can only realistically ever have 1 active instance of an
IndexRequestBuilder per thread, why do you allow users to create multiple
instances of IndexRequestBuilder in the same thread in the first place?
Seems like that just asks for people to screw it up like I did. Are there
use cases where one would actually want/be able to have such behavior across
multiple instances?

Bob

On Tue, Apr 5, 2011 at 6:06 PM, Shay Banon shay.banon@elasticsearch.comwrote:

This is by design in order to reduce array copies when using single index
requests. When you add an index request builder to a bulk request, it gets
copied over. What you need to do is keep the bulk request around and keep
adding to it, instead of keeping the index builders around.

On Wednesday, April 6, 2011 at 2:03 AM, Bob wrote:

I'm trying to process a number of IndexRequests through the Bulk API
in 0.15.2 through the java api. To do so I have a loop that first
creates all the IndexRequestBuilders and stores them in a list. At a
later time I then iterate through the list and add each
indexrequestbuilder to particular bulk request. The documents I'm
indexing will all generate the exact same size source, but the content
of the source will be different (different constant size guids, etc).

Each individual IndexRequestBuilders is generated correctly. However,
setting the source on a new IndexRequestBuilder changes the source on
EVERY IndexRequestBuilder that was created before it. Looking at the
IndexRequestBuilders instances, all the builders use the exact same
source instance (a byte array).

Is this a bug or as designed? If as designed, why? Am I mistaken in
that this limits the number of IndexRequestBuilders that you can have
instantiated at one time to one per thread?

Thanks,
Bob


(Shay Banon) #4

Well, its not really the index request builder, its the XContentBuilder used. By default, you an use XContentBuilder#jsonBuilder, but you can also use XContentFactory#safeJsonBuilder which will not use thread local buffer.
On Wednesday, April 6, 2011 at 6:59 AM, Bob Jacoby wrote:

Thanks, Shay. As soon as I figured out what was going on I started passing my bulk request around everywhere. Fixed things right up.

If you can only realistically ever have 1 active instance of an IndexRequestBuilder per thread, why do you allow users to create multiple instances of IndexRequestBuilder in the same thread in the first place? Seems like that just asks for people to screw it up like I did. Are there use cases where one would actually want/be able to have such behavior across multiple instances?

Bob

On Tue, Apr 5, 2011 at 6:06 PM, Shay Banon shay.banon@elasticsearch.com wrote:

This is by design in order to reduce array copies when using single index requests. When you add an index request builder to a bulk request, it gets copied over. What you need to do is keep the bulk request around and keep adding to it, instead of keeping the index builders around.
On Wednesday, April 6, 2011 at 2:03 AM, Bob wrote:

I'm trying to process a number of IndexRequests through the Bulk API
in 0.15.2 through the java api. To do so I have a loop that first
creates all the IndexRequestBuilders and stores them in a list. At a
later time I then iterate through the list and add each
indexrequestbuilder to particular bulk request. The documents I'm
indexing will all generate the exact same size source, but the content
of the source will be different (different constant size guids, etc).

Each individual IndexRequestBuilders is generated correctly. However,
setting the source on a new IndexRequestBuilder changes the source on
EVERY IndexRequestBuilder that was created before it. Looking at the
IndexRequestBuilders instances, all the builders use the exact same
source instance (a byte array).

Is this a bug or as designed? If as designed, why? Am I mistaken in
that this limits the number of IndexRequestBuilders that you can have
instantiated at one time to one per thread?

Thanks,
Bob


(jacorob) #5

Ah. Makes sense. Thanks again, Shay! I'm constantly amazed at how fast you
respond to questions on this list!

Bob

On Wed, Apr 6, 2011 at 4:13 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Well, its not really the index request builder, its the XContentBuilder
used. By default, you an use XContentBuilder#jsonBuilder, but you can also
use XContentFactory#safeJsonBuilder which will not use thread local buffer.

On Wednesday, April 6, 2011 at 6:59 AM, Bob Jacoby wrote:

Thanks, Shay. As soon as I figured out what was going on I started passing
my bulk request around everywhere. Fixed things right up.

If you can only realistically ever have 1 active instance of an
IndexRequestBuilder per thread, why do you allow users to create multiple
instances of IndexRequestBuilder in the same thread in the first place?
Seems like that just asks for people to screw it up like I did. Are there
use cases where one would actually want/be able to have such behavior across
multiple instances?

Bob

On Tue, Apr 5, 2011 at 6:06 PM, Shay Banon shay.banon@elasticsearch.comwrote:

This is by design in order to reduce array copies when using single index
requests. When you add an index request builder to a bulk request, it gets
copied over. What you need to do is keep the bulk request around and keep
adding to it, instead of keeping the index builders around.

On Wednesday, April 6, 2011 at 2:03 AM, Bob wrote:

I'm trying to process a number of IndexRequests through the Bulk API
in 0.15.2 through the java api. To do so I have a loop that first
creates all the IndexRequestBuilders and stores them in a list. At a
later time I then iterate through the list and add each
indexrequestbuilder to particular bulk request. The documents I'm
indexing will all generate the exact same size source, but the content
of the source will be different (different constant size guids, etc).

Each individual IndexRequestBuilders is generated correctly. However,
setting the source on a new IndexRequestBuilder changes the source on
EVERY IndexRequestBuilder that was created before it. Looking at the
IndexRequestBuilders instances, all the builders use the exact same
source instance (a byte array).

Is this a bug or as designed? If as designed, why? Am I mistaken in
that this limits the number of IndexRequestBuilders that you can have
instantiated at one time to one per thread?

Thanks,
Bob


(system) #6