In the 0.13 release notes you write there is "Improved Support for Large Number of Indices." How can we get a sense of how many indices can be wisely supported, and when we've gone too far? I'm guessing it would be important to consider how many indices you can leave open at once, and how long it takes to open one up.
We index documents for users; they only need to be able to search against their own data. We've always included a user identifier in the documents and put them all in the same index. But with the 0.13 announcement we are wondering if we can have one index per user and speed up search performance.
Or, would a better way to accomplish the same search performance improvement be to use the routing based on custom value (ie. user identifier) discussed in the "Improved Support for Large Number of Indices" section?
Hey,
  Let me try and explain the improvement a bit, and then we can have a better picture. In elasticsearch, there are two main "states" that are stored, one is the cluster state, and one is the state each node has.   The cluster state includes all the indices metadata (settings, mappings sources - json form) and routing information.  The node state includes index level state (parsed mappings tree structure for fast Lucene Document creation), and shard level state (the Lucene constructs to index and search data, since each shard is a Lucene index).  Pre 0.13, the index level node state was created on every node (including client nodes). There has been a big refactoring in 0.13 to have the ability to index and search just using the cluster state without needing the node level index data, so now, index level node structures are created lazily on each node only when a shard needs to be allocated on it.  What does it mean? It means that with a big enough cluster, node
s that end up not being allocated a shard of a specific index will not incur the overhead of that index structures created.
   What does it mean for many indices? It means that for cases where you get big distribution of indices, which end up not allocating at all shards for a specific index, the overhead will be much smaller.   It does still mean that a shard is a Lucene index, and if you have 2-3 nodes with many indices, you will still have the same possible problems (just creating many Lucene indices on the same node).   Regarding the routing, then yes, they can really help at solving the problem. Use the user name as the routing value, and when searching, provide the user name as the routing value (you will still need to filter by it), and it will end up hitting a single shard for search, and not all shards.cheers,-shay.banon
On Saturday, November 20, 2010 at 6:41 AM, John Chang wrote:
In the 0.13 release notes you write there is "Improved Support for LargeNumber of Indices." How can we get a sense of how many indices can bewisely supported, and when we've gone too far? I'm guessing it would beimportant to consider how many indices you can leave open at once, and howlong it takes to open one up.We index documents for users; they only need to be able to search againsttheir own data. We've always included a user identifier in the documentsand put them all in the same index. But with the 0.13 announcement we arewondering if we can have one index per user and speed up search performance.Or, would a better way to accomplish the same search performance improvementbe to use the routing based on custom value (ie. user identifier) discussedin the "Improved Support for Large Number of Indices" section?Thanks and congrats on the 0.13 release!-- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-many-indices-in-0-13-tp1934624p1934
624.htmlSent from the ElasticSearch Users mailing list archive at Nabble.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.