Disabling indexing during bulk publishing using the java api

Russell_Snyder · October 5, 2012, 3:32pm

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--

Ivan · October 5, 2012, 4:33pm

Here is a long way to update the settings via the client:

ImmutableSettings.Builder settings = ImmutableSettings.settingsBuilder();
settings.put("refresh_interval", -1);
UpdateSettingsRequest updateSettingsRequest = new
UpdateSettingsRequest(indexName);
updateSettingsRequest.settings(updateSettings);
client.admin().indices().updateSettings(updateSettingsRequest).actionGet();

Setting the refresh interval to -1 is the only setting I change before bulk
indexing to a live index. For new indices, I remove all replicas first and
add them in after indexing is done (waiting for a green state before
actually using the index). Indexing should be completely under your control
unless you are using a river. Can you implement a singleton that controls
indexing?

Cheers,

Ivan

On Fri, Oct 5, 2012 at 8:32 AM, Russell Snyder
russell@redowlanalytics.comwrote:

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--

--

Russell_Snyder · October 5, 2012, 5:12pm

Ivan,

Thanks for the response. That sounds like a pretty viable solution, and
I'll give it a try shortly.

Thanks,
Russell

On Friday, October 5, 2012 11:32:19 AM UTC-4, Russell Snyder wrote:

Hi,

We're trying to load 400k documents using the java bulk api (Client
and BulkRequestBuilder) and are running into some timeout issues that we
think could be remedied by disabling indexing during the publish, then
re-enabling immediately upon publish completion. Is there a way to use the
Client to execute the below PUT call, or something similar?

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

Any Ideas or suggestions for tuning elasticsearch to better handle a
massive amount of writes?

Thanks,
Russell

--

Topic		Replies	Views
Java Client Bulk API performance settings ES 5.x Elasticsearch	6	1731	October 5, 2017
Java API client : How to reset index settings value Elasticsearch language-clients , runtime-fields	3	734	April 28, 2023
Java api : bulk refresh Elasticsearch	4	1661	July 5, 2017
Pushing data to elasticsearch using java api Elasticsearch	2	1157	December 23, 2016
Best way to bulk insert? Elasticsearch	13	6397	July 6, 2017

Disabling indexing during bulk publishing using the java api

Related topics