Script Sorting Appers to be Limited to 32 bit floats


(davrob) #1

I'm using ElasticSearch Version 0.17.6

Currently, I'm running a Java Sort Script 3 Letters Plus Number String Sorthttps://gist.github.com/3145802 on
one field.

I need to extend this to cover multi-column sort, unfortunately
ElasticSearch (in my version) seems limited to me creating numbers that
only differ from each other by about 24 or 30 bits, no matter whether I
implement AbstractFloatSearchScript, AbstractLongSearchScript
or AbstractDoubleSearchScript - I thought that Long and Double might give
me 64 bits worth of room to compare numbers with - but no matter which
version I use I always appear to run into a limit where that if numbers
differ by more than about 2^30 - then ElasticSearch seems to refuse to
compare them.

Difficult to do a reproduction, this is my attempt to expand the range of
numbers I compare to 64 bits (I also tried using Doubles as well as Longs) Attempted
Multi-Colum Sort Using Long return value. https://gist.github.com/3145858

David.


(Clinton Gormley) #2

Hi David

Currently, I'm running a Java Sort Script 3 Letters Plus Number String
Sort on one field.

I need to extend this to cover multi-column sort, unfortunately
ElasticSearch (in my version) seems limited to me creating numbers
that only differ from each other by about 24 or 30 bits, no matter
whether I implement AbstractFloatSearchScript,
AbstractLongSearchScript or AbstractDoubleSearchScript - I thought
that Long and Double might give me 64 bits worth of room to compare
numbers with - but no matter which version I use I always appear to
run into a limit where that if numbers differ by more than about 2^30

  • then ElasticSearch seems to refuse to compare them.

I'll leave the Java internals to somebody else, but using scripts for
sorting is not the most efficient way of doing it, not least because the
results don't get cached.

Can you give us an example of real data and how you want to sort it? It
may be possible to index the data in a way that you can sort without
scripts.

ta

clint


(davrob) #3

Hi Clinton,

From your response, are you saying that custom scripts are limited to 32
bits only. I was hoping that someone would know how to
extend AbstractSearchScript, AbstractExecutableScript to give me more
numbers to sort against.

I kind of anticipated that someone would suggest altering the way I index
this particular field, and that is a reasonable question, but the reasons
for using what we do, the way we do it, is quite involved.

The data structure we are using is a field on a contact, the field is
defined like this:

"customColumns": {
"type":"nested",
"properties" : {
"ccId" : {"type" : "string", "index" : "not_analyzed"},
"value" : {"type" : "string"},
"valueLC" : {"type" : "string", "index" : "not_analyzed"}
}
}

It is essentially a name:value mapping between ccId and value. We use
calculate this ccId at runtime to create "context aware fields". Depending
on the userId and the list that a contact is in a ccId is generated at
runtime. Then, at runtime, the value of the ccId is looked up in the
customColumns source (which is a map in the java) and added as a top-level
field on the contact (which also is a map). In pseudo code the process,
which is executed at runtime, after the query results are returned is:

ccId, ccName = f (contactId, userId, listId);

Map contactSource = results.getSource();

Map customColsSource = contactSource["customColumns"];

value = customColsSource["ccId"];

contactSource[ccName] = value;

delete contactSource["customColumns"]

... then convert List to JSON and return.

The effect of this is to create a context aware field on the index. Each
user my have up to 50 of these, and there are 3000 users, so we potentially
have something like 150,000 fields that can be added to a contact depending
on context.

Otherwise, contacts have about 70 fields. 400,000 contacts take up about
1.5GB of VM memory.

  • David.

On Friday, July 20, 2012 9:19:04 AM UTC+1, Clinton Gormley wrote:

Hi David

Currently, I'm running a Java Sort Script 3 Letters Plus Number String
Sort on one field.

I need to extend this to cover multi-column sort, unfortunately
ElasticSearch (in my version) seems limited to me creating numbers
that only differ from each other by about 24 or 30 bits, no matter
whether I implement AbstractFloatSearchScript,
AbstractLongSearchScript or AbstractDoubleSearchScript - I thought
that Long and Double might give me 64 bits worth of room to compare
numbers with - but no matter which version I use I always appear to
run into a limit where that if numbers differ by more than about 2^30

  • then ElasticSearch seems to refuse to compare them.

I'll leave the Java internals to somebody else, but using scripts for
sorting is not the most efficient way of doing it, not least because the
results don't get cached.

Can you give us an example of real data and how you want to sort it? It
may be possible to index the data in a way that you can sort without
scripts.

ta

clint


(Clinton Gormley) #4

Hiya

From your response, are you saying that custom scripts are limited to
32 bits only. I was hoping that someone would know how to extend
AbstractSearchScript, AbstractExecutableScript to give me more numbers
to sort against.

No I'm not - I have no idea :slight_smile:

The effect of this is to create a context aware field on the index.
Each user my have up to 50 of these, and there are 3000 users, so we
potentially have something like 150,000 fields that can be added to a
contact depending on context.

OK, so my solution seems unlikely to help

clint


(davrob) #5

Thanks for your input Clinton, much appreciated, as always.

Regarding the sort scripts, I've been looking at the JavaCode that invokes
the custom sort scripts, it is a class called CustomScoreQueryParser that
has an inner class that invokes this function:

@Override public float score(int docId, float subQueryScore) {
script.setNextDocId(docId);
script.setNextScore(subQueryScore);
return script.runAsFloat();
}

I think the particular line is **** return script.runAsFloat() *****.

It looks like the Lucene Score collector (TopScoreDocCollector) is
expecting floats, when invoking this code:

public void collect(int doc) throws IOException {
float score = scorer.score();

so that is where the limitation lies - the Lucene Collector expects
floats, so the ElasticSearch custom script has no option but to give it
floats, even if the calculation is done in terms of Longs or Doubles, it
will always be cast to a float.

Seems like I have no way round this limitation.

If there are any other suggestions I would welcome them.

  • David.

On Friday, July 20, 2012 10:23:47 AM UTC+1, Clinton Gormley wrote:

Hiya

From your response, are you saying that custom scripts are limited to
32 bits only. I was hoping that someone would know how to extend
AbstractSearchScript, AbstractExecutableScript to give me more numbers
to sort against.

No I'm not - I have no idea :slight_smile:

The effect of this is to create a context aware field on the index.
Each user my have up to 50 of these, and there are 3000 users, so we
potentially have something like 150,000 fields that can be added to a
contact depending on context.

OK, so my solution seems unlikely to help

clint


(system) #6