Hello.
Is there any chance to store floating point numeric doc values in not sorted version?
To give you a little background: I send documents to ES and they have a field which is a vector of floats. Later I use these float values in a script (scoring). The order of float values in vector matters but in my script function, I always get values which are sorted. I quickly scanned ES code and saw many places where Sorted DV are used. Is there any way to change this behavior? Of course, I could use binary data but this will not be convenient because it will be like reinventing a wheel.
Unfortunately, there's no way to opt out of sorted doc values. The sorting is done to help compression (when sorted, the values delta-compress better).
There are three workarounds, but none are great. The first is to access the document's _source, which gives you access to the original JSON. Unfortunately, the source has to be loaded off disk and decompressed for each document... it'll be a lot slower than doc values.
The second option is to store the values in a nested field, where each value is a different "nested document". This maintains order but isn't really useful from a script.
The last option is to store the vector in a string, and then just parse that string into numerics inside your script. It'll have a bit of overhead due to the string parsing, and won't compress as well in the index as a string vs. array of numerics... but you'll be able to retain ordering.
Thank you for your answer @polyfractal. I decided to use 4th version which I wanted to avoid but will be the fastest one: - binary doc values. Thank you very much once again
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.