Enhance performance using "rounded" timestamp

shushu · May 31, 2016, 10:35am

Hello,
We have an index that get ~15M documents a day, each with a epoch milliseconds timestamp.
Date aggregations seems to work fine, but the bigger the index - the longer the query.

One action would be to separate the index per day, which will allow us flexibility and much better performance.

Another idea was to "round" the milliseconds timestamp into minutes, so instead of each mseconds, it will be indexed in its minute-related msecond.

My question is - would it make performance better ?
Considering the related field index would be much smaller (x60000 smaller) - would it make the query faster ?

I failed to find any example like that, nor recommendation for such an action.
Regards,
Shushu

colings86 · May 31, 2016, 12:33pm

If your use case is able to round dates at index time to minute resolution then this would definitely be a good thing to do as it would reduce the index size as you said. The index size saving would be because there would be less terms in the inverted index but also because the gcd compression used in doc values and the compression of the source field would both be more efficient.

In terms of query performance I am not sure you would see a lot of improvement. In 2.x numeric fields (date fields are actually indexed as long fields) use trie encoding to enable faster range querying. This indexes each value as multiple terms at different resolutions (e.g. in a base10 trie encoding 124 could be indexed as 124, 120 and 100). This means that at query time we can minimise the number of terms we need to search can be minimised by using these different levels of resolution. Rounding your values to the nearest minute will mean you have less terms at the lowest levels (hence the reduction in inverted index size) but on most ranges only a few of these terms will be used anyway. You may see an increase in query performance for small ranges where the proportion of these low level terms is high compared with the total number of terms used for the query, but I would have thought that for long ranges you would not see a significant performance increase since the proportion of these low level terms used would be small.

So the upshot of this is that you should do this if you can because you should see a good reduction in index size, but you may not see any change in query performance.

Hope that helps

shushu · May 31, 2016, 1:12pm

Thanks !
It helps, though it just means I rather not spend time on this, since my main goal was to enhance query performance.
It is cool to know Elastic is built-in with those kind of capabilities.

Topic		Replies	Views
Performance querying time-based indices in a date range Elasticsearch	3	2374	August 3, 2020
Date time with millisecond Elasticsearch	3	6956	July 6, 2017
Does it make sense to add timestamp in "index sorting"? Elasticsearch	3	773	November 10, 2021
Would replacing millisecond with seconds still help query faster? Elasticsearch	3	991	July 16, 2018
Trading index performance for search performance Elasticsearch	6	642	July 6, 2017

Enhance performance using "rounded" timestamp

Related topics