Does index routing work with Transport Client?


(Amit) #1

Hello All,

I am really confused and tried different things to make the Index Routing
Allocation work with Transport Client. But it does seem to work.
What I mean is, can I really specify;

index.routing.allocation.include.tag: node1

With Transport Client to rout an Index.

I am new to elastic search just trying out different options available with
elastic search.

Please suggest.

Thanks

Amit

--


(Frederic) #2

Hi Amit,

I think there are two different concepts involved in your question: Hitting
the "right" server when indexing, in order to index faster and routing a
doc whitin the ES cluster, in order to have some type of docs hosted
together ('sharded') in one shard and then improve searches (i.e hitting
only the server/shard who has the data).

A Transport Client does not know the topology of the cluster, since it is
not 'connected' to it, as Node clients are (ie. TC's hit servers in a
round-robin fashion) . Thus, such a client cannot route a document to a
specific server/shard for doing the former operation, but you can certainly
do the latter.

If, when creating an index, you declare a mapping for your doc and define a
'_routing' field
()http://www.elasticsearch.org/guide/reference/mapping/routing-field.html,
when a doc is indexed, ES checks that field and redirect internaly the
document to the corresponding shard. That means that probably the client
won't hit the destination server, but the doc will be routed internally.

Hope it helps!

Fede

On Tuesday, 25 September 2012 12:35:15 UTC-3, Amit Singh wrote:

Hello All,

I am really confused and tried different things to make the Index Routing
Allocation work with Transport Client. But it does seem to work.
What I mean is, can I really specify;

index.routing.allocation.include.tag: node1

With Transport Client to rout an Index.

I am new to elastic search just trying out different
options available with elastic search.

Please suggest.

Thanks

Amit

--


(Amit) #3

Thanks Fede!! You saved me from going mad :slight_smile:

Yes I agree with Transport Client ( TC) not knowing the state of the
cluster. I debugged the entire indexing code and have come
to conclusion that when an Index is created the TC , creates the bytes for
the index request and call the Elastic Search( ES) servers in a round robin
fashion.
when the server read the input byte stream and extracts it to create the
Create Index Request object. It also create the Index Meta Data object
along with other routing , cluster ,allocation etc objects. And uses these
object to place/rout the index.

I am not convenced but why is the index meta data information not passed to
the server, if has any. Hence the server would use that and Index the data
to a particular node/server. what is the restriction that is preventing it
to do so. Some thing like;

MetaData metaData = newMetaDataBuilder()

.put(newIndexMetaDataBuilder("test5").settings(settingsBuilder()

.put("index.number_of_shards", 2)

.put("index.number_of_replicas", 0)

.put("index.routing.allocation.include.tag", "node2")

.put("index.routing.allocation.exclude.tag", "node1")

.build()))

.build();

IndexRequest source = Requests.indexRequest("test5" ).type("type").source(
"field", "value");

MappingMetaData mappingMd = null;

if (metaData.hasIndex(source.index())) {

mappingMd = metaData.index(source.index()).mapping(source.type());

}

source.process(metaData, "test5" , mappingMd, false);

Because I see that the server create this object but from its server
configuration file.

2- I tried using the routing paramater available with Index Request and
provided it a value say "node1". But when I look at the data folder for all
the servers I see that the size has increased for all of them. I did not
look into the details of data though. While as you said the index should
have been only routed to some particular shard/node.

Please suggest.

Thanks
Amit

On Tuesday, September 25, 2012 10:43:47 PM UTC+5:30, Frederic wrote:

Hi Amit,

I think there are two different concepts involved in your question:
Hitting the "right" server when indexing, in order to index faster and
routing a doc whitin the ES cluster, in order to have some type of docs
hosted together ('sharded') in one shard and then improve searches (i.e
hitting only the server/shard who has the data).

A Transport Client does not know the topology of the cluster, since it is
not 'connected' to it, as Node clients are (ie. TC's hit servers in a
round-robin fashion) . Thus, such a client cannot route a document to a
specific server/shard for doing the former operation, but you can certainly
do the latter.

If, when creating an index, you declare a mapping for your doc and define
a '_routing' field ()
http://www.elasticsearch.org/guide/reference/mapping/routing-field.html,
when a doc is indexed, ES checks that field and redirect internaly the
document to the corresponding shard. That means that probably the client
won't hit the destination server, but the doc will be routed internally.

Hope it helps!

Fede

On Tuesday, 25 September 2012 12:35:15 UTC-3, Amit Singh wrote:

Hello All,

I am really confused and tried different things to make the Index Routing
Allocation work with Transport Client. But it does seem to work.
What I mean is, can I really specify;

index.routing.allocation.include.tag: node1

With Transport Client to rout an Index.

I am new to elastic search just trying out different
options available with elastic search.

Please suggest.

Thanks

Amit

--


(Jörg Prante) #4

Hi Amit,

you have examined the source, I think you understand the concept: on the
one hand, cluster nodes (server-side) are responsible for the internal ES
operations by using the cluster metadata (where you want to put the routing
parameters), on the other hand the TransportClient (and NodeClient) are
totally isolated from this, they must remote control the cluster nodes,
have their own settings, and they do not control the index creation but
just send a command to the cluster nodes for doing this.

So, just use the Cluster Update API in an extra step before creating the
index. Issue something to change the routing

TransportClient client = ...
Map<String, Object> newSettings = new HashMap();
newSettings.put(key, value);
...
ClusterUpdateSettingsRequestBuilder settingsRequest =
client.admin().cluster().prepareUpdateSettings();
settingsRequest.setPersistentSettings(newSettings);
ClusterUpdateSettingsResponse settingsResponse =
settingsRequest.execute().actionGet();

and then, after updating the cluster, create the index.

Because it is a little bit confusing to find out the responsibilities of
the server side and the client side, I started a project to modularize the
ES codebase into server code and client code.

Best regards,

Jörg

On Tuesday, September 25, 2012 8:15:23 PM UTC+2, Amit Singh wrote:

Thanks Fede!! You saved me from going mad :slight_smile:

Yes I agree with Transport Client ( TC) not knowing the state of the
cluster. I debugged the entire indexing code and have come
to conclusion that when an Index is created the TC , creates the bytes for
the index request and call the Elastic Search( ES) servers in a round robin
fashion.
when the server read the input byte stream and extracts it to create the
Create Index Request object. It also create the Index Meta Data object
along with other routing , cluster ,allocation etc objects. And uses these
object to place/rout the index.

I am not convenced but why is the index meta data information not passed
to the server, if has any. Hence the server would use that and Index the
data to a particular node/server. what is the restriction that is
preventing it to do so. Some thing like;

MetaData metaData = newMetaDataBuilder()

.put(newIndexMetaDataBuilder("test5").settings(settingsBuilder()

.put("index.number_of_shards", 2)

.put("index.number_of_replicas", 0)

.put("index.routing.allocation.include.tag", "node2")

.put("index.routing.allocation.exclude.tag", "node1")

.build()))

.build();

IndexRequest source = Requests.indexRequest("test5" ).type("type"
).source("field", "value");

MappingMetaData mappingMd = null;

if (metaData.hasIndex(source.index())) {

mappingMd = metaData.index(source.index()).mapping(source.type());

}

source.process(metaData, "test5" , mappingMd, false);

Because I see that the server create this object but from its server
configuration file.

2- I tried using the routing paramater available with Index Request and
provided it a value say "node1". But when I look at the data folder for all
the servers I see that the size has increased for all of them. I did not
look into the details of data though. While as you said the index should
have been only routed to some particular shard/node.

Please suggest.

Thanks
Amit

On Tuesday, September 25, 2012 10:43:47 PM UTC+5:30, Frederic wrote:

Hi Amit,

I think there are two different concepts involved in your question:
Hitting the "right" server when indexing, in order to index faster and
routing a doc whitin the ES cluster, in order to have some type of docs
hosted together ('sharded') in one shard and then improve searches (i.e
hitting only the server/shard who has the data).

A Transport Client does not know the topology of the cluster, since it is
not 'connected' to it, as Node clients are (ie. TC's hit servers in a
round-robin fashion) . Thus, such a client cannot route a document to a
specific server/shard for doing the former operation, but you can certainly
do the latter.

If, when creating an index, you declare a mapping for your doc and define
a '_routing' field ()
http://www.elasticsearch.org/guide/reference/mapping/routing-field.html,
when a doc is indexed, ES checks that field and redirect internaly the
document to the corresponding shard. That means that probably the client
won't hit the destination server, but the doc will be routed internally.

Hope it helps!

Fede

On Tuesday, 25 September 2012 12:35:15 UTC-3, Amit Singh wrote:

Hello All,

I am really confused and tried different things to make the Index
Routing Allocation work with Transport Client. But it does seem to work.
What I mean is, can I really specify;

index.routing.allocation.include.tag: node1

With Transport Client to rout an Index.

I am new to elastic search just trying out different
options available with elastic search.

Please suggest.

Thanks

Amit

--


(Amit) #5

Thanks Jorg for the reply.

I have updated the cluster state as mentioned by you;

TransportClient tc = new TransportClient(s);

tc.addTransportAddress(new InetSocketTransportAddress("192.168.1.102",
9300));

tc.addTransportAddress(new InetSocketTransportAddress("192.168.1.102",
9301));

Map<String, Object> newSettings = new HashMap();

newSettings.put("cluster.routing.allocation.include.tag", "node2");

newSettings.put("cluster.routing.allocation.exclude.tag", "node1");

ClusterUpdateSettingsRequestBuilder settingsRequest =
tc.admin().cluster().prepareUpdateSettings();

settingsRequest.setPersistentSettings(newSettings);

ClusterUpdateSettingsResponse settingsResponse =
settingsRequest.execute().actionGet();

BulkRequestBuilder bulkRequest = tc.prepareBulk();

for (int i = 0; i < 3; i++) {

IndexRequest source = Requests.indexRequest("test12" ).type("type"
).source("field", "value");

bulkRequest.add(source);

}

bulkRequest.execute().actionGet();
But I am not getting the desired result. First I included the node1
and excluded node2 , updated the settings and created the index. Indexing
works as expected.

Now I change the settings to include node2 and exclude node1, update the
setting and index the data. But the indexing behaves with the previous
settings.

Secondly I cannot specify the index routing setting in the settings map.
like index.routing.allocation.include.tag1

The server logs says;
ignoring persistent setting [index.routing.allocation.include.tag1], not
dynamically updateable

Please suggest.

Thanks
Amit

On Wednesday, September 26, 2012 2:08:53 AM UTC+5:30, Jörg Prante wrote:

Hi Amit,

you have examined the source, I think you understand the concept: on the
one hand, cluster nodes (server-side) are responsible for the internal ES
operations by using the cluster metadata (where you want to put the routing
parameters), on the other hand the TransportClient (and NodeClient) are
totally isolated from this, they must remote control the cluster nodes,
have their own settings, and they do not control the index creation but
just send a command to the cluster nodes for doing this.

So, just use the Cluster Update API in an extra step before creating the
index. Issue something to change the routing

TransportClient client = ...
Map<String, Object> newSettings = new HashMap();
newSettings.put(key, value);
...
ClusterUpdateSettingsRequestBuilder settingsRequest =
client.admin().cluster().prepareUpdateSettings();
settingsRequest.setPersistentSettings(newSettings);
ClusterUpdateSettingsResponse settingsResponse =
settingsRequest.execute().actionGet();

and then, after updating the cluster, create the index.

Because it is a little bit confusing to find out the responsibilities of
the server side and the client side, I started a project to modularize the
ES codebase into server code and client code.

Best regards,

Jörg

On Tuesday, September 25, 2012 8:15:23 PM UTC+2, Amit Singh wrote:

Thanks Fede!! You saved me from going mad :slight_smile:

Yes I agree with Transport Client ( TC) not knowing the state of the
cluster. I debugged the entire indexing code and have come
to conclusion that when an Index is created the TC , creates the bytes for
the index request and call the Elastic Search( ES) servers in a round robin
fashion.
when the server read the input byte stream and extracts it to create the
Create Index Request object. It also create the Index Meta Data object
along with other routing , cluster ,allocation etc objects. And uses these
object to place/rout the index.

I am not convenced but why is the index meta data information not passed
to the server, if has any. Hence the server would use that and Index the
data to a particular node/server. what is the restriction that is
preventing it to do so. Some thing like;

MetaData metaData = newMetaDataBuilder()

.put(newIndexMetaDataBuilder("test5").settings(settingsBuilder()

.put("index.number_of_shards", 2)

.put("index.number_of_replicas", 0)

.put("index.routing.allocation.include.tag", "node2")

.put("index.routing.allocation.exclude.tag", "node1")

.build()))

.build();

IndexRequest source = Requests.indexRequest("test5" ).type("type"
).source("field", "value");

MappingMetaData mappingMd = null;

if (metaData.hasIndex(source.index())) {

mappingMd = metaData.index(source.index()).mapping(source.type());

}

source.process(metaData, "test5" , mappingMd, false);

Because I see that the server create this object but from its server
configuration file.

2- I tried using the routing paramater available with Index Request and
provided it a value say "node1". But when I look at the data folder for all
the servers I see that the size has increased for all of them. I did not
look into the details of data though. While as you said the index should
have been only routed to some particular shard/node.

Please suggest.

Thanks
Amit

On Tuesday, September 25, 2012 10:43:47 PM UTC+5:30, Frederic wrote:

Hi Amit,

I think there are two different concepts involved in your question:
Hitting the "right" server when indexing, in order to index faster and
routing a doc whitin the ES cluster, in order to have some type of docs
hosted together ('sharded') in one shard and then improve searches (i.e
hitting only the server/shard who has the data).

A Transport Client does not know the topology of the cluster, since it
is not 'connected' to it, as Node clients are (ie. TC's hit servers in a
round-robin fashion) . Thus, such a client cannot route a document to a
specific server/shard for doing the former operation, but you can certainly
do the latter.

If, when creating an index, you declare a mapping for your doc and
define a '_routing' field ()
http://www.elasticsearch.org/guide/reference/mapping/routing-field.html,
when a doc is indexed, ES checks that field and redirect internaly the
document to the corresponding shard. That means that probably the client
won't hit the destination server, but the doc will be routed internally.

Hope it helps!

Fede

On Tuesday, 25 September 2012 12:35:15 UTC-3, Amit Singh wrote:

Hello All,

I am really confused and tried different things to make the Index
Routing Allocation work with Transport Client. But it does seem to work.
What I mean is, can I really specify;

index.routing.allocation.include.tag: node1

With Transport Client to rout an Index.

I am new to elastic search just trying out different
options available with elastic search.

Please suggest.

Thanks

Amit

--


(Jörg Prante) #6

Hi Amit,

On Wednesday, September 26, 2012 1:50:58 PM UTC+2, Amit Singh wrote:

But I am not getting the desired result. First I included the node1
and excluded node2 , updated the settings and created the index. Indexing
works as expected.

Now I change the settings to include node2 and exclude node1, update the
setting and index the data. But the indexing behaves with the previous
settings.

yes, the cluster setting is cluster wide. Now I understand what you want to
achieve is shard placement control per index. Have you tried the commands
shown
at http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/
?

Secondly I cannot specify the index routing setting in the settings map.
like index.routing.allocation.include.tag1

The server logs says;
ignoring persistent setting [index.routing.allocation.include.tag1], not
dynamically updateable

This is expected as the Cluster Update API is not index-wise. Use the Index
Settings API.

Best regards,

Jörg

--


(Frederic) #7

Hi Amit, Jörg

I was posting a large reply but GGroups failed and I lost it :slight_smile: So, the
short one:

Amit, are you sure that you dont want to 'route' documents instead of
allocate shards/indexes? Allocating shards/indices will tell ES exactly in
which nodes you want the information be placed. That is quite low-level,
but of course allows you to give more or less resources to a given index in
a cluster, but of course only makes sense if you have multiple indices.

On the other hand you have document-routing, wich at some point is
independent from the previous setting. It means that a given type of docs
will be indexed in a specific shard, but that is choosen by ES. This
functionality is specially useful when you have, for instance, information
associated with users (e.g. eBay items). If you don't route docs, when
indexed, they will be uniformely distributed accross all shards (and
eventually nodes), so, when searching that info, ES will hit all shards in
order to gather the info.
If you do route docs, by user.id for example, all docs associated to the
same user will be placed in the same shard (and replica of course) and when
you search items for a given user, ES will only hit those shards/nodes that
have the info, saving requests and time.

This depends of course on your requirements but it is important to notice
the difference from shards and nodes when thinking about hosting info.

Cheers!

On Wednesday, 26 September 2012 10:29:10 UTC-3, Jörg Prante wrote:

Hi Amit,

On Wednesday, September 26, 2012 1:50:58 PM UTC+2, Amit Singh wrote:

But I am not getting the desired result. First I included the node1
and excluded node2 , updated the settings and created the index. Indexing
works as expected.

Now I change the settings to include node2 and exclude node1, update the
setting and index the data. But the indexing behaves with the previous
settings.

yes, the cluster setting is cluster wide. Now I understand what you want
to achieve is shard placement control per index. Have you tried the
commands shown at
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/?

Secondly I cannot specify the index routing setting in the settings map.
like index.routing.allocation.include.tag1

The server logs says;
ignoring persistent setting [index.routing.allocation.include.tag1], not
dynamically updateable

This is expected as the Cluster Update API is not index-wise. Use the
Index Settings API.

Best regards,

Jörg

--


(system) #8