Uncompressed DocValues


I'm running an aggregation heavy application. I have plenty of ram, and would like to improve terms aggregation performance on a field of "long" values. Is it possible to instruct ES to leave the doc values for this field uncompressed to improve performance (there is plenty of ram to hold the array)? Would this be of any benefit?

(Mark Walkom) #2

I don't think this would be much of an improvement.
Reduced disk size means less data to read off disk which is a good thing, plus the overhead in CPU to expand that data is going to be negligible.

That said I don't think you can specify individual fields to remain uncompressed. The closest thing you want is the old field data representation that doc values replaced.


@warkolm Thanks. That make sense.

Is it still possible in any way to override the default Lucene codec for a field? If not a field, for the whole index? I can think of a few special cases where this might still be useful, but I get that it's a use at your own risk kindof thing.

(Shay Banon) #4

Note, there is a difference between the compression that happens by default for stored fields (_source) by Lucene, and how it handles doc values. Doc values are not compressed where LZ4 is applied to them, they might be better represented on disk depending on the type (like numerics), but thats it.

(system) #5