Java Bulk API & creating index

pzelnip · February 24, 2011, 10:00pm

Are there any examples anywhere displaying the use of the bulk Java API? There's examples via the REST API on the website, but nothing in regards to the Java-side. Basically what I want to do is index a bunch of documents at once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index, items is a List, and buildJSON() is a method for building the JSON for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is essentially me guessing at how the Java bulk API is supposed to be used, as there is no documentation in regards to the Java bulk API on the website. So I have 2 questions:

(newbie-ish question) Is the above code correct/am I doing it wrong?
How do I (using the Java API) create the index before handing it off to the bulk API?
(less newbie-ish) how many documents can/should I throw off to the bulk API at once? As it stands I'm doing them all at once (I'm only prototyping at this point, so collections are small), but how well will it scale? Is thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.

kimchy · February 26, 2011, 8:40pm

On Friday, February 25, 2011 at 12:00 AM, pzelnip wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

(newbie-ish question) Is the above code correct/am I doing it wrong?
Yep, looks good. You can also do brb.add(Requests.indexRequest(...)).

How do I (using the Java API) create the index before handing it off to
the bulk API?
There is a create index API. client.admin().indices().prepareCreate("index_name")...

(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?
There has to be enough memory to represent all in bulk request in memory while indexing. So, you should make sure that you don't create too many of those in a single bulk.

Thanks.

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creating-index-tp2571056p2571056.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Jacob_Perkins · February 27, 2011, 3:10pm

The bulk-loader we wrote uses the java bulk api, see:

It should be pretty straightforward to navigate through as the amount
of code is rather minimal (thanks to elasticsearch's awesome api)

--jacob
@thedatachef

On Feb 24, 4:00 pm, pzelnip pzel...@gmail.com wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

(newbie-ish question) Is the above code correct/am I doing it wrong?

How do I (using the Java API) create the index before handing it off to
the bulk API?

(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creatin...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

pzelnip · February 28, 2011, 9:08pm

Wow, that's pretty impressive that so much is done with so little.
Thanks for pointing me in the direction of it, that code will most
definitely be helpful.

Adam

On Feb 27, 7:10 am, Jacob Perkins jacob.a.perk...@gmail.com wrote:

The bulk-loader we wrote uses the java bulk api, see:

GitHub - infochimps/wonderdog: Wonderdog is now at https://github.com/infochimps-labs/wonderdog) ElasticSearch and Hadoop and beautiful bouncy elephant love.

It should be pretty straightforward to navigate through as the amount
of code is rather minimal (thanks to elasticsearch's awesome api)

--jacob
@thedatachef

On Feb 24, 4:00 pm, pzelnip pzel...@gmail.com wrote:

Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.

Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):

BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();

Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:

(newbie-ish question) Is the above code correct/am I doing it wrong?

How do I (using the Java API) create the index before handing it off to
the bulk API?

(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?

Thanks.

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-creatin...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Topic		Replies	Views
Bulk update with java API Elasticsearch	3	2457	July 6, 2017
Bulk request how to? Elasticsearch	3	1403	June 15, 2022
Java-api Bulk indexing Elasticsearch	4	298	July 6, 2017
Java Bulk indexing API performances Elasticsearch	3	318	July 6, 2017
Java 8.1 bulk request Elasticsearch	7	3565	May 24, 2022

Java Bulk API & creating index

Thanks.

Thanks.

Thanks.

Related topics