Are there any examples anywhere displaying the use of the bulk Java API? There's examples via the REST API on the website, but nothing in regards to the Java-side. Basically what I want to do is index a bunch of documents at once, and create the index if it does not already exist.
Currently I have (DataItem is the class which holds a document to index, items is a List, and buildJSON() is a method for building the JSON for a DataItem):
BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();
Which seems to work, if the index already exists. However, this code is essentially me guessing at how the Java bulk API is supposed to be used, as there is no documentation in regards to the Java bulk API on the website. So I have 2 questions:
(newbie-ish question) Is the above code correct/am I doing it wrong?
How do I (using the Java API) create the index before handing it off to the bulk API?
(less newbie-ish) how many documents can/should I throw off to the bulk API at once? As it stands I'm doing them all at once (I'm only prototyping at this point, so collections are small), but how well will it scale? Is thousands ok? Tens of thousands? Hundreds of thousands? Millions?
On Friday, February 25, 2011 at 12:00 AM, pzelnip wrote:
Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.
Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):
BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();
Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:
(newbie-ish question) Is the above code correct/am I doing it wrong?
Yep, looks good. You can also do brb.add(Requests.indexRequest(...)).
How do I (using the Java API) create the index before handing it off to
the bulk API?
There is a create index API. client.admin().indices().prepareCreate("index_name")...
(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?
There has to be enough memory to represent all in bulk request in memory while indexing. So, you should make sure that you don't create too many of those in a single bulk.
Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.
Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):
BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();
Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:
(newbie-ish question) Is the above code correct/am I doing it wrong?
How do I (using the Java API) create the index before handing it off to
the bulk API?
(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?
Wow, that's pretty impressive that so much is done with so little.
Thanks for pointing me in the direction of it, that code will most
definitely be helpful.
Are there any examples anywhere displaying the use of the bulk Java API?
There's examples via the REST API on the website, but nothing in regards to
the Java-side. Basically what I want to do is index a bunch of documents at
once, and create the index if it does not already exist.
Currently I have (DataItem is the class which holds a document to index,
items is a List, and buildJSON() is a method for building the JSON
for a DataItem):
BulkRequestBuilder brb = client.prepareBulk();
for (DataItem di : items)
brb.add(
client.prepareIndex(indexName, typeName, di.get_id())
.setSource(buildJSON(di.getDataItems()))
);
brb.execute().actionGet();
Which seems to work, if the index already exists. However, this code is
essentially me guessing at how the Java bulk API is supposed to be used, as
there is no documentation in regards to the Java bulk API on the website.
So I have 2 questions:
(newbie-ish question) Is the above code correct/am I doing it wrong?
How do I (using the Java API) create the index before handing it off to
the bulk API?
(less newbie-ish) how many documents can/should I throw off to the bulk
API at once? As it stands I'm doing them all at once (I'm only prototyping
at this point, so collections are small), but how well will it scale? Is
thousands ok? Tens of thousands? Hundreds of thousands? Millions?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.