Question on routing

Now I am using elasticsearch as a realtime log analysis. Unfortunately, I
am having performance issue. To resolve this issue, I'd like to try custom
routing with timestamp because our realtime log analysis will be focused on
things such as the last 15 minutes, last 1 hour, or last 4 hours. Is it
possible sharding based on time range? If it's not supported yet, which can
be a good start to implement custom routing logic?

The second question is, currently, as I guess, elasticsearch routing logic
is gathering records with the same routing id in the same shard. If the
data has a skewed distribution on the routing field, does elasticsearch
make balanced shards across the cluster?

Thank you
Best, Jae

--

Hey Jae,

You are better to use routing in this case based on date value. You can
route at index or query time on every variable you want, just add the
routing parameter to your query like this : curl -XPUT
http://127.0.0.1:9200/index/type/id?_routing=your_value. You should create
a custom timestamp base on day date and maybe add hour if you have a lot of
logs (_routing=2012100912). All documents indexed with the same routing
value will be routed to the same shard. Use the same logic to query ES.

Even if your data is too big for one shard, Elasticsearch will spread this
shard on 2 nodes. So your query will be optimized, only querying 2 shards
and not all the shards.

--

Thanks a lot!

On Fri, Nov 9, 2012 at 12:12 PM, Loïc Bertron loic.bertron@gmail.com wrote:

Hey Jae,

You are better to use routing in this case based on date value. You can
route at index or query time on every variable you want, just add the
routing parameter to your query like this : curl -XPUT
http://127.0.0.1:9200/index/type/id?_routing=your_value. You should create a
custom timestamp base on day date and maybe add hour if you have a lot of
logs (_routing=2012100912). All documents indexed with the same routing
value will be routed to the same shard. Use the same logic to query ES.

Even if your data is too big for one shard, Elasticsearch will spread this
shard on 2 nodes. So your query will be optimized, only querying 2 shards
and not all the shards.

--

--