Java Bulk API & creating index


(pzelnip) #1

Are there any examples anywhere displaying the use of the bulk Java API? There's examples via the REST API on the website, but nothing in regards to the Java-side. Basically what I want to do is index a bunch of documents at once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index, items is a List, and buildJSON() is a method for building the JSON for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is essentially me guessing at how the Java bulk API is supposed to be used, as there is no documentation in regards to the Java bulk API on the website. So I have 2 questions:

  1. (newbie-ish question) Is the above code correct/am I doing it wrong?
  2. How do I (using the Java API) create the index before handing it off to the bulk API?
  3. (less newbie-ish) how many documents can/should I throw off to the bulk API at once? As it stands I'm doing them all at once (I'm only prototyping at this point, so collections are small), but how well will it scale? Is thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.


(Shay Banon) #2

On Friday, February 25, 2011 at 12:00 AM, pzelnip wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

  1. (newbie-ish question) Is the above code correct/am I doing it wrong?
    Yep, looks good. You can also do brb.add(Requests.indexRequest(...)).

  2. How do I (using the Java API) create the index before handing it off to
    the bulk API?
    There is a create index API. client.admin().indices().prepareCreate("index_name")...

  3. (less newbie-ish) how many documents can/should I throw off to the bulk
    API at once? As it stands I'm doing them all at once (I'm only prototyping
    at this point, so collections are small), but how well will it scale? Is
    thousands ok? Tens of thousands? Hundreds of thousands? Millions?
    There has to be enough memory to represent all in bulk request in memory while indexing. So, you should make sure that you don't create too many of those in a single bulk.

Thanks.

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creating-index-tp2571056p2571056.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Jacob Perkins) #3

The bulk-loader we wrote uses the java bulk api, see:

It should be pretty straightforward to navigate through as the amount
of code is rather minimal (thanks to elasticsearch's awesome api)

--jacob
@thedatachef

On Feb 24, 4:00 pm, pzelnip pzel...@gmail.com wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

  1. (newbie-ish question) Is the above code correct/am I doing it wrong?
  2. How do I (using the Java API) create the index before handing it off to
    the bulk API?
  3. (less newbie-ish) how many documents can/should I throw off to the bulk
    API at once? As it stands I'm doing them all at once (I'm only prototyping
    at this point, so collections are small), but how well will it scale? Is
    thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creatin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(pzelnip) #4

Wow, that's pretty impressive that so much is done with so little.
Thanks for pointing me in the direction of it, that code will most
definitely be helpful.

Adam

On Feb 27, 7:10 am, Jacob Perkins jacob.a.perk...@gmail.com wrote:

The bulk-loader we wrote uses the java bulk api, see:

http://github.com/infochimps/wonderdog

It should be pretty straightforward to navigate through as the amount
of code is rather minimal (thanks to elasticsearch's awesome api)

--jacob
@thedatachef

On Feb 24, 4:00 pm, pzelnip pzel...@gmail.com wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

  1. (newbie-ish question) Is the above code correct/am I doing it wrong?
  2. How do I (using the Java API) create the index before handing it off to
    the bulk API?
  3. (less newbie-ish) how many documents can/should I throw off to the bulk
    API at once? As it stands I'm doing them all at once (I'm only prototyping
    at this point, so collections are small), but how well will it scale? Is
    thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creatin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #5