Not sorted numeric doc values

(Paweł Róg) #1

Is there any chance to store floating point numeric doc values in not sorted version?

To give you a little background: I send documents to ES and they have a field which is a vector of floats. Later I use these float values in a script (scoring). The order of float values in vector matters but in my script function, I always get values which are sorted. I quickly scanned ES code and saw many places where Sorted DV are used. Is there any way to change this behavior? Of course, I could use binary data but this will not be convenient because it will be like reinventing a wheel.

Thanks :slight_smile:

(Zachary Tong) #2

Unfortunately, there's no way to opt out of sorted doc values. The sorting is done to help compression (when sorted, the values delta-compress better).

There are three workarounds, but none are great. The first is to access the document's _source, which gives you access to the original JSON. Unfortunately, the source has to be loaded off disk and decompressed for each document... it'll be a lot slower than doc values.

The second option is to store the values in a nested field, where each value is a different "nested document". This maintains order but isn't really useful from a script.

The last option is to store the vector in a string, and then just parse that string into numerics inside your script. It'll have a bit of overhead due to the string parsing, and won't compress as well in the index as a string vs. array of numerics... but you'll be able to retain ordering.

(Paweł Róg) #3

Thank you for your answer @polyfractal. I decided to use 4th version which I wanted to avoid but will be the fastest one: - binary doc values. Thank you very much once again :slight_smile:

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.