I'm evaluating Elasticsearch for a relatively large cluster, with a "per
user"-like query pattern. I went through the forum post and the video
below.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
http://www.elasticsearch.org/videos/big-data-search-and-analytics/
Custom routing by userid makes sense as a way to leverage the partitioned
nature of queries. Both post and video mention that it allows for a large
overallocation, i.e. 50.
My question is, where would the bottleneck arise if we use custom routing
with much higher overallocation? Say, 200 shards per node?
There are concerns with open file handles and compactness, but that could
be addressed with more aggressive merging of segments (maybe just 1 segment
/ tier). Merging overhead is reduced since segments will be
proportionately smaller.
Will the master node not be able to keep up with shard allocation etc.?
How do filter and fielddata caching work with custom routing? When you
issue a query, does it populate caches just for that shard? Or for all
replicas of that shard / all shards on the node / all shards in the index.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d2c74e-ef31-4ba3-8276-c0e9dd01d496%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.