Disabling indexing during bulk publishing using the java api


(Russell Snyder) #1

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--


(Ivan Brusic) #2

Here is a long way to update the settings via the client:

ImmutableSettings.Builder settings = ImmutableSettings.settingsBuilder();
settings.put("refresh_interval", -1);
UpdateSettingsRequest updateSettingsRequest = new
UpdateSettingsRequest(indexName);
updateSettingsRequest.settings(updateSettings);
client.admin().indices().updateSettings(updateSettingsRequest).actionGet();

Setting the refresh interval to -1 is the only setting I change before bulk
indexing to a live index. For new indices, I remove all replicas first and
add them in after indexing is done (waiting for a green state before
actually using the index). Indexing should be completely under your control
unless you are using a river. Can you implement a singleton that controls
indexing?

Cheers,

Ivan

On Fri, Oct 5, 2012 at 8:32 AM, Russell Snyder
russell@redowlanalytics.comwrote:

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--

--


(Russell Snyder) #3

Ivan,

Thanks for the response. That sounds like a pretty viable solution, and
I'll give it a try shortly.

Thanks,
Russell

On Friday, October 5, 2012 11:32:19 AM UTC-4, Russell Snyder wrote:

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--


(system) #4