Hi,
I've been indexing about 90,000 documents in about 8 minutes. Is that
good performance for elastic search? I think other people have got
much better. I've based my indexing on the wikipedia river. Is it
possible to do the following:
- Is bulk updates the way to go for performance or is iterating
though individual saves as fast? - Can I increase the bulk size from 100 to 1000 or 10,000?
- Will several concurrent thread adding bulk updates to the same
master node help performance?
thanks, my current update code is below.
David.
public Boolean bulkSave(JsonBean jsonBean, BulkRequestBuilder
currentRequest) {
IndexRequest indexOperation =
Requests.indexRequest(indexName)
.type(jsonBean.getIndexType())
.id("" + jsonBean.getId())
.create(false)
.source(jsonBean.getJson());
currentRequest.add(indexOperation);
Boolean createNew =
processBulkIfNeeded(currentRequest, processedAccount++);
return createNew;
}
private Boolean processBulkIfNeeded(BulkRequestBuilder
currentRequest, int processedAccounts) {
Boolean createNewBulkReq = false;
if (currentRequest.numberOfActions() >= bulkSize) {
// execute the bulk operation
int currentOnGoingBulks =
onGoingBulks.incrementAndGet();
LOG.info("Ongoing Bulks = " + currentOnGoingBulks + "
processed Accounts = " + processedAccounts );
if (currentOnGoingBulks > dropThreshold) {
// TODO, just wait here!, we can slow down the
parsing
onGoingBulks.decrementAndGet();
LOG.error("dropping bulk, " + onGoingBulks + "
crossed threshold " + dropThreshold + " processed Accounts = " +
processedAccounts);
} else {
try {
final int bulkNo = onGoingBulks.get();
LOG.info("Executing Bulk Request " + bulkNo );
currentRequest.execute(new
ActionListener() {
@Override public void
onResponse(BulkResponse bulkResponse) {
onGoingBulks.decrementAndGet();
LOG.info("Bulk [" + bulkNo + "]
Executed");
}
@Override public void onFailure(Throwable
e) {
LOG.error("Error in Bulk [" + bulkNo +
"]", e);
}
});
} catch (Exception e) {
LOG.error("failed to process bulk", e);
}
}
// once we have executed a bulk request, create a new
one for adding
// a fresh set of bulk updates, deletes, additions
etc.
createNewBulkReq = true;
}
return createNewBulkReq;
}