Hi,
i have a daily index with a size of 400GB, ~300M documents, and 9 primary shards (no replicas).
the indices are created on a daily basis, and for each new index that is created, the previous one is forced merged to have max number of segments 1 per shard.
the ES version is 2.4.4, and cluster state is always green.
every node has 30G of heap size, and the total number of data nodes is 14.
i chose 9 shards for this index so that my shards will have about 45G of data each (in many places i read that a good rule of thumb is having up to 50G of data per shard).
it is indexed with performance metrics from different systems in different geo-locations.
one of the fields in each of the documents is the location where it was indexed from (United States, Germany, Israel and so on). in Kibana, a user will always first select the location and only then drill down further in the data.
this means that one index, contains many documents that are not of the interest of the end user (from other geo-locations).
would it be better - performance wise, to have index per location, so that ES will not have to scan through so many documents that are not related to a search query? - this means i will have to create the same dashboard several times (per location). the use of aliases can prevent that, but returns me to the first problem where i am searching many indices which i know do not contain the data i need, and therefore impact the query time in a negative way.
In more generally - if i run an aggregation search on term that i know does not exist - should i expect it to return almost immediately?
thanks,
Ofer