Limit searches to specific shards

jason_3 · October 10, 2011, 1:25pm

Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?

only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR
client specifies which shards the search is limited to (for a given
index/type/etc).

I don't think the "routing" feature supports limiting a search to a
specified shard number.

Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.

If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).

If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?

Thanks,

--Jason

kimchy · October 12, 2011, 6:38pm

Heya,

Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?

-shay.banon

On Mon, Oct 10, 2011 at 3:25 PM, jason jason.trost@gmail.com wrote:

Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?

only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR

client specifies which shards the search is limited to (for a given
index/type/etc).

I don't think the "routing" feature supports limiting a search to a
specified shard number.

Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.

If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).

If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?

Thanks,

--Jason

kimchy · October 12, 2011, 7:11pm

Ok, opened Search / Get Preference: Add _only_node:[node_id] option · Issue #1388 · elastic/elasticsearch · GitHub and
pushed the implementation to master. See if it works for you.

On Wed, Oct 12, 2011 at 8:38 PM, Shay Banon kimchy@gmail.com wrote:

Heya,

Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?

-shay.banon

On Mon, Oct 10, 2011 at 3:25 PM, jason jason.trost@gmail.com wrote:

Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?

only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR

client specifies which shards the search is limited to (for a given
index/type/etc).

I don't think the "routing" feature supports limiting a search to a
specified shard number.

Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.

If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).

If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?

Thanks,

--Jason

jason_3 · October 12, 2011, 9:04pm

Awesome. Thanks Shay. I will check this out tomorrow.

On Wed, Oct 12, 2011 at 3:11 PM, Shay Banon kimchy@gmail.com wrote:

Ok, opened Search / Get Preference: Add _only_node:[node_id] option · Issue #1388 · elastic/elasticsearch · GitHub and
pushed the implementation to master. See if it works for you.

On Wed, Oct 12, 2011 at 8:38 PM, Shay Banon kimchy@gmail.com wrote:

Heya,

Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?

-shay.banon

On Mon, Oct 10, 2011 at 3:25 PM, jason jason.trost@gmail.com wrote:

Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?

only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR

client specifies which shards the search is limited to (for a given
index/type/etc).

I don't think the "routing" feature supports limiting a search to a
specified shard number.

Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.

If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).

If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?

Thanks,

--Jason

Topic		Replies	Views
Docs about sharding and scatter/gather Elasticsearch	5	1853	July 6, 2017
Internal API to query against a shard Elasticsearch	9	478	June 21, 2019
Precise shard routing, remote document specific nodes Elasticsearch	3	313	July 6, 2017
Advice on "sharded" client setup Elasticsearch	2	400	July 6, 2017
Allocating data to specific shard and node Elasticsearch	2	350	June 21, 2018

Limit searches to specific shards

Related topics