Most efficient way of accessing long[][] data in Painless scripts

Hi!

We have a custom solution for calculating similarities using feature vectors. Each document in Elasticsearch index can have multiple entities. Thus, for each document we have basically long[][] formatted data we would like to use to calculate distances. Such as this:

[ 
 [378322298287171600,-9182346388506132000,-7884923301547995000,2954398850619687400,5792760765226170000,6941191355558596000,-9175934689997701000,2453767474651472000],
 [2395942447151390700,7206045950792974000,-6761273774897486000,648553841033347700,-4591414079501816000,3563632123683616000,288379928265751740,733693665263878500],
  ...
]

How should we define the mappings in order to access this data in Painless scripts in a performant way:

long[][] vectors = doc['vectors'].value;

Thanks for all the tips! :pray:

FYI: I found Access fields in a document with the field API | Elasticsearch Guide [8.8] | Elastic which talks about accessing binary format, so I tried that with a mapping:

"viewVectors" : {
  "properties" : {
    "id_01_00" : {
      "type" : "binary",
      "store" : true,
      "doc_values" : true
    }
  }
}

And with that I seem to be able to get access to BytesRef... However, I'd like to use Java's ByteArrayInputStream & ObjectInputStream to transform it into List<List<Long>>, but apparently the Stream classes from java.io.* are not available in Painless. :pensive:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.