hi all!
i am getting some very strange behavior while using scripted sorting.
the script itself looks approximately like that:
if (doc['fieldName1'].size() != 0 && doc['fieldName2'].size() != 0) { // 1 - might not be true, so a fallback scenario is added
long field1 = doc['fieldName1'].value;
double field2 = doc['fieldName2'].value;
if (field1 == 2) {
return field1 * params.paramValueName / field2
} else {
return field2 * params.paramValueName / field1
}
} else {
if (doc['fallBackField'].size() != 0) { // 2 - always true, checked for emptiness out of safety reasons
return doc['fallbackField'].value;
} else {
return 1; // least possible case
}
};
each time a script is run, it ends in an arm 2 where the fallback scenario returns the value of a fallback field.
but when i query the index manually via curl by the known id, i can see that the values checked in the first if are always filled.
In your script you are looking at doc values but the output of your curl command returns _source.
To use _source in your script you can use for example:
params._source['fieldName1']
Or you ensure that you have mappings defined for your fields, so that doc values exist.
Whether it is better to doc values or _source depends on what you want to do and therefore hard to answer without knowing your use case. Please have a look at the documentation link.
the command was just a naive proof that those fields are not empty and therefore contain values, i thought that would just be a nice example that documents that are being sorted actually contain
valid data, that is used for sorting itself. there is a non zero possibility that i just get it wrong :))
i will try to explain the whole scenario:
we have a set of documents that have prices (long numeric values), so basically "fieldName1" is "price", while "fieldName2" is "currency", a "fallbackField" is an outdated field that also contains a price, but is going to be deprecated soon, so is not really wanted to be relied on.
if price and currency are not null, an exchange rate ("params.paramValueName") should be applied to the price value. the set of documents should be therefore sorted against the price*currency rate value, if neither price, nor currency rate are known, a value of a fallback field should be used as a sorted value.
The curl command does not return the same values as your script. Again, you are comparing doc values and source. These are not the same, doc values are normalized, e.g. you could ingest a timestamp as long, but map it as date with timezone difference. You might also disable doc values for certain fields if you do not plan to search or aggregate on them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.