Java client connection question

Vinicius_Carvalho · November 30, 2012, 8:40pm

Hi there.

I'm doing some tests here, and before executing a bulk index in my cluster
(2 nodes, replica =1) I set the index_refresh_interval = -1.

I later connect to one of the nodes by using the transport client point to
host:9300.

I was expecting that only that node, would have the data inserted (since
I've set the interval to -1) Isn't this the case?

I'm trying to come up with a good solution to index a lot of data on live
environment without having to penalize the live nodes. Trying to synch all
data to one node first, and then let it replicate to the others. Any
advices on this?

Regards

--

jprante · December 1, 2012, 12:32am

Hi Vinicius,

you always ingest the data to the nodes that hold the shards of the index,
as long as you don't care for routing,
see http://www.elasticsearch.org/guide/reference/api/index_.html

But, explicit routing is the exception. Think of Elasticsearch of a
multi-node system that was invented to distribute the load over your nodes
automatically. No manual intervention required. You won't need to care
about indexing that can penalize search. If your load rises, add more
nodes. It's that easy.

Only for exceptional situations, you may consider shard placement control
and other techniques, move parts of an index, avoiding hot spots etc.

For baseline ingesting, just set up primary shards for faster data loading.
After ingesting is done, add a replica level to the shards.

Best regards,

Jörg

--

Topic		Replies	Views
Load balancing issues on cluster while indexing Elasticsearch	3	361	July 6, 2017
Still have Indexing Question Elasticsearch	5	244	July 6, 2017
Alternative bulk indexing implementations? Elasticsearch	10	2278	July 5, 2017
Transport vs Node client for large (billion +) bulk inserts? Elasticsearch	6	897	July 5, 2017
Best way to bulk insert? Elasticsearch	13	6397	July 6, 2017

Java client connection question

Related topics