Pre-Sorted Doc Values


(Harlin) #1

I was looking into Lucene and noticed that it has a SortedDocValues Class. Does anybody know if Elasticsearch's doc_values are already using this class or if Elastic plans on implementing anything like this in future versions?

Being able to store field data pre-sorted would drastically improve query speed in my use case.

Thanks,
Harlin


(lisak) #2

That would be an awesome feature. I just found out that sort-scrolling anything with 10+ millions documents (with doc_values) is basically so slow that doing it doesn't make sense... 30M takes hours and 100M takes hunders of hours. And I don't think that scaling up horizontally or vertically would help...


(Adrien Grand) #3

Yes, SortedDocValues/SortedSetDocValues are what elasticsearch is using to store doc values on not_analyzed string fields.

However I'm not sure it does what you think it does: this class just maps every unique value to an ordinal and then every document to the ordinals of the values that it contains. Which we later use for sorting and aggregations.

I think that the feature you are after is rather index sorting, which is not implemented yet: https://github.com/elastic/elasticsearch/issues/6720


(lisak) #4

Good point. Btw regarding doc_values, is it possible to set _timestamp field as doc_values? Because sorting on _timestamp might be the most frequent use case.


(Adrien Grand) #5

Yes, _timestamp supports doc values.


(system) #6