Node indexing data locality

I have a dataset I would like to index without any specific document shard
routing importance.
The dataset is TBs and I find the network trips of redstributing the
documents across the cluster very wasteful.
What I would like to do is that each node will index its own local data.
Is it possible to do that? perhaps programatically?

--

Have you tried bulk indexing? It reduces network cycles effectively.
Note, also the node client is using the network to communicate with the
cluster nodes.
Usually, there is no advantage in indexing data only to the local node,
since all documents should be distributed over the data nodes by the
sharding algorithm. You can enforce shard locality of all docs indexed
by using the same _route parameter value, and assign shards and nodes
statically. But it's tedious to implement such things, many cool
Elasticsaerch features are disabled then.

Jörg

Am 28.01.13 17:16, schrieb Hadar Rottenberg:

I have a dataset I would like to index without any specific document
shard routing importance.
The dataset is TBs and I find the network trips of redstributing the
documents across the cluster very wasteful.
What I would like to do is that each node will index its own local data.
Is it possible to do that? perhaps programatically?

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

how can i use the _route parameter and use , say 3(same index , but
distributed) out of my 20 (total for multiple indexes) shards for indexing ?
i mean does the _routing parameter supports wildcards ?
otherwise , data'll just keep going to a single shard .

On Monday, January 28, 2013 11:52:02 PM UTC+5:30, Jörg Prante wrote:

Have you tried bulk indexing? It reduces network cycles effectively.
Note, also the node client is using the network to communicate with the
cluster nodes.
Usually, there is no advantage in indexing data only to the local node,
since all documents should be distributed over the data nodes by the
sharding algorithm. You can enforce shard locality of all docs indexed
by using the same _route parameter value, and assign shards and nodes
statically. But it's tedious to implement such things, many cool
Elasticsaerch features are disabled then.

Jörg

Am 28.01.13 17:16, schrieb Hadar Rottenberg:

I have a dataset I would like to index without any specific document
shard routing importance.
The dataset is TBs and I find the network trips of redstributing the
documents across the cluster very wasteful.
What I would like to do is that each node will index its own local data.
Is it possible to do that? perhaps programatically?

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Exactly, the _route is for indexing to one shard only. You can't cut
down indexing to single node only, because the primary shards are
usually distributed over several nodes.

Jörg

Am 29.01.13 15:14, schrieb tarang dawer:

how can i use the _route parameter and use , say 3(same index , but
distributed) out of my 20 (total for multiple indexes) shards for
indexing ?
i mean does the _routing parameter supports wildcards ?
otherwise , data'll just keep going to a single shard .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

not to same node , but even if it can do up to only those 3 primary
nodes(of same index) which are distributed . is that possible in some way
? like if i can specify the routing parameter as index* (* corresponding
to index1 ,index2, index3 ) all on separate nodes ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, 2013-01-29 at 20:45 +0530, tarang dawer wrote:

not to same node , but even if it can do up to only those 3 primary
nodes(of same index) which are distributed . is that possible in some
way ? like if i can specify the routing parameter as index* (*
corresponding to index1 ,index2, index3 ) all on separate nodes ?

You could use multiple indices, and use "awareness"

to control which index is on which box. Then you can just index to
index_1 from box_1, etc

You can always run queries across all indices:

GET /index_1,index_2/_search

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.