Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?
only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR
client specifies which shards the search is limited to (for a given
index/type/etc).
I don't think the "routing" feature supports limiting a search to a
specified shard number.
Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.
If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).
If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?
Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?
Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?
only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR
client specifies which shards the search is limited to (for a given
index/type/etc).
I don't think the "routing" feature supports limiting a search to a
specified shard number.
Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.
If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).
If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?
On Wed, Oct 12, 2011 at 8:38 PM, Shay Banon kimchy@gmail.com wrote:
Heya,
Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?
Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?
only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR
client specifies which shards the search is limited to (for a given
index/type/etc).
I don't think the "routing" feature supports limiting a search to a
specified shard number.
Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.
If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).
If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?
On Wed, Oct 12, 2011 at 8:38 PM, Shay Banon kimchy@gmail.com wrote:
Heya,
Got it, It should be simple to add. We already have a preference
parameter, need to find a good way to express what you are after as a value.
Open an issue?
Is it possible to create some sort of client (Transport client, Node
client or REST) that does either?
only searches the shards local to the server the client is
connected to (for a given index/type/etc)
OR
client specifies which shards the search is limited to (for a given
index/type/etc).
I don't think the "routing" feature supports limiting a search to a
specified shard number.
Here is my use case. We would like to use ES as both the index and
data store. We need to be able to run MapReduce across all the data
in the datastore periodically. Each node runs ES, a Hadoop DataNode,
and Hadoop TaskTracker. I want to create a Hadoop InputFormat that is
truly data local based on the shards. We evaluated wonderdog and it
works okay, but having truly data local searches would be more
scalable.
If the client could limit searches to specific shards or limit
searches to only local shards then I think this would be possible
since I can determine which shards reside on which machines via (curl -
s -XGET 'http://localhost:9200/_cluster/state' and equivalent Java
code).
If this search limiting functionality doesn't exist, would it be
appropriate for a plugin or is this an internals change that is more
far reaching?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.