Hi Clinton,
From your response, are you saying that custom scripts are limited to 32
bits only. I was hoping that someone would know how to
extend AbstractSearchScript, AbstractExecutableScript to give me more
numbers to sort against.
I kind of anticipated that someone would suggest altering the way I index
this particular field, and that is a reasonable question, but the reasons
for using what we do, the way we do it, is quite involved.
The data structure we are using is a field on a contact, the field is
defined like this:
"customColumns": {
"type":"nested",
"properties" : {
"ccId" : {"type" : "string", "index" : "not_analyzed"},
"value" : {"type" : "string"},
"valueLC" : {"type" : "string", "index" : "not_analyzed"}
}
}
It is essentially a name:value mapping between ccId and value. We use
calculate this ccId at runtime to create "context aware fields". Depending
on the userId and the list that a contact is in a ccId is generated at
runtime. Then, at runtime, the value of the ccId is looked up in the
customColumns source (which is a map in the java) and added as a top-level
field on the contact (which also is a map). In pseudo code the process,
which is executed at runtime, after the query results are returned is:
ccId, ccName = f (contactId, userId, listId);
Map contactSource = results.getSource();
Map customColsSource = contactSource["customColumns"];
value = customColsSource["ccId"];
contactSource[ccName] = value;
delete contactSource["customColumns"]
... then convert List to JSON and return.
The effect of this is to create a context aware field on the index. Each
user my have up to 50 of these, and there are 3000 users, so we potentially
have something like 150,000 fields that can be added to a contact depending
on context.
Otherwise, contacts have about 70 fields. 400,000 contacts take up about
1.5GB of VM memory.
On Friday, July 20, 2012 9:19:04 AM UTC+1, Clinton Gormley wrote:
Hi David
Currently, I'm running a Java Sort Script 3 Letters Plus Number String
Sort on one field.
I need to extend this to cover multi-column sort, unfortunately
Elasticsearch (in my version) seems limited to me creating numbers
that only differ from each other by about 24 or 30 bits, no matter
whether I implement AbstractFloatSearchScript,
AbstractLongSearchScript or AbstractDoubleSearchScript - I thought
that Long and Double might give me 64 bits worth of room to compare
numbers with - but no matter which version I use I always appear to
run into a limit where that if numbers differ by more than about 2^30
- then Elasticsearch seems to refuse to compare them.
I'll leave the Java internals to somebody else, but using scripts for
sorting is not the most efficient way of doing it, not least because the
results don't get cached.
Can you give us an example of real data and how you want to sort it? It
may be possible to index the data in a way that you can sort without
scripts.
ta
clint