Big index vs small ones - querying only subset of the data each time


(Ofer Kesten) #1

Hi,

i have a daily index with a size of 400GB, ~300M documents, and 9 primary shards (no replicas).
the indices are created on a daily basis, and for each new index that is created, the previous one is forced merged to have max number of segments 1 per shard.
the ES version is 2.4.4, and cluster state is always green.
every node has 30G of heap size, and the total number of data nodes is 14.
i chose 9 shards for this index so that my shards will have about 45G of data each (in many places i read that a good rule of thumb is having up to 50G of data per shard).
it is indexed with performance metrics from different systems in different geo-locations.
one of the fields in each of the documents is the location where it was indexed from (United States, Germany, Israel and so on). in Kibana, a user will always first select the location and only then drill down further in the data.
this means that one index, contains many documents that are not of the interest of the end user (from other geo-locations).
would it be better - performance wise, to have index per location, so that ES will not have to scan through so many documents that are not related to a search query? - this means i will have to create the same dashboard several times (per location). the use of aliases can prevent that, but returns me to the first problem where i am searching many indices which i know do not contain the data i need, and therefore impact the query time in a negative way.
In more generally - if i run an aggregation search on term that i know does not exist - should i expect it to return almost immediately?

thanks,
Ofer


(Mark Walkom) #2

Adding in filters will always improve performance (assuming they are applied). So it might make sense to have some prebuilt filters at the top of the dashboards that let your choose between locations.

Are you having performance issues now?

No, Elasticsearch still needs to run the agg to know there's no values :slight_smile:


(Ofer Kesten) #3

by adding filters i assume you are referring to Kibana search bar, and this can be done - i can save dashboards with pre-defined filters for the locations, however this still searches all those unrelated documents (of other locations that do not match the filter) - correct?
so in that case i am still wondering if there will be an advantage to few small indices that hold data which is absolutely relevant for the search, than one big index.
there is the overhead of maintaining more indices, however if search queries will run faster, perhaps that worth it.


(Mark Walkom) #4

Any filters, in Kibana or directly in json.

The idea of filters is to only search the items that match the filter that is being applied. So no.


(Ofer Kesten) #5

thanks for the details


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.