Big index vs small ones - querying only subset of the data each time

Oferkes · October 6, 2017, 7:06pm

Hi,

i have a daily index with a size of 400GB, ~300M documents, and 9 primary shards (no replicas).
the indices are created on a daily basis, and for each new index that is created, the previous one is forced merged to have max number of segments 1 per shard.
the ES version is 2.4.4, and cluster state is always green.
every node has 30G of heap size, and the total number of data nodes is 14.
i chose 9 shards for this index so that my shards will have about 45G of data each (in many places i read that a good rule of thumb is having up to 50G of data per shard).
it is indexed with performance metrics from different systems in different geo-locations.
one of the fields in each of the documents is the location where it was indexed from (United States, Germany, Israel and so on). in Kibana, a user will always first select the location and only then drill down further in the data.
this means that one index, contains many documents that are not of the interest of the end user (from other geo-locations).
would it be better - performance wise, to have index per location, so that ES will not have to scan through so many documents that are not related to a search query? - this means i will have to create the same dashboard several times (per location). the use of aliases can prevent that, but returns me to the first problem where i am searching many indices which i know do not contain the data i need, and therefore impact the query time in a negative way.
In more generally - if i run an aggregation search on term that i know does not exist - should i expect it to return almost immediately?

thanks,
Ofer

warkolm · October 6, 2017, 9:04pm

Adding in filters will always improve performance (assuming they are applied). So it might make sense to have some prebuilt filters at the top of the dashboards that let your choose between locations.

Are you having performance issues now?

No, Elasticsearch still needs to run the agg to know there's no values

Oferkes · October 7, 2017, 7:26pm

by adding filters i assume you are referring to Kibana search bar, and this can be done - i can save dashboards with pre-defined filters for the locations, however this still searches all those unrelated documents (of other locations that do not match the filter) - correct?
so in that case i am still wondering if there will be an advantage to few small indices that hold data which is absolutely relevant for the search, than one big index.
there is the overhead of maintaining more indices, however if search queries will run faster, perhaps that worth it.

warkolm · October 7, 2017, 7:28pm

Any filters, in Kibana or directly in json.

The idea of filters is to only search the items that match the filter that is being applied. So no.

Oferkes · October 8, 2017, 5:13am

thanks for the details

system · November 5, 2017, 5:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dashboard taking much time to load Elasticsearch	11	6255	July 5, 2017
Elasticsearch performance tuning doubts Elasticsearch	8	964	June 30, 2019
Small vs Large indices Elasticsearch	7	7833	July 5, 2017
Filtered Aliases VS Separate indexes Elasticsearch	12	1416	July 15, 2017
Daily index and monthly index query performance difference Elasticsearch	3	1003	December 22, 2020

Big index vs small ones - querying only subset of the data each time

Related topics