In elasticsearch, I have implemented a custom script using java and am using it in the query. The java code (created plugin for elasticsearch) is a method that will accept two string and return a value. One is a hard-coded string, the other should be the current document's field value. I'm not able to pass the document field value as a parameter. Maybe I'm doing it wrong and there is an easier way of doing it. The query should be something like:
"current documents value of movieName field". This is what is required to be done such that 'The Star Wars' string is compared with every field value of movieName. If this is replaced with a hard-coded value, it works.
Also if the field can directly be accessed in the java code instead of passing as parameter, that will do as well.
params is for passing constant values for the execution of a single query. To access doc values, you need to interact with Lucene, within your script. Have you looked at how the advanced script example docs do this?
Thank you for replying, I did look at the advanced script example docs but don't seem to understand how to access field value per document. My initial understanding was that the native script written should execute for every document like a function would for every row in a sql query. If the query is returning 3 documents (doc:1,2,3), then for comparing the constant value 'The Star Wars', script should run as:
doc:1, movieName:"Star Wars" (script runs to compare ('The Star Wars','Star Wars'))
doc:2, movieName:"Starr Warz" (script run to compare ('The Star Wars','Starr Warz'))
doc:3, movieName:"The Star Wars" (script run to compare ('The Star Wars','The Star Wars'))
I have two questions with respect to this:
Will the script run per query or per document in the query's result? i.e. 1 time vs 3 times as for the above?
If it runs per document, how can I access the document field value? 'The Star Wars' input value can be passed as params and the document field value can be accessed within the plugin.
Will the script run per query or per document in the query's result? i.e. 1 time vs 3 times as for the above?
While this depends on the type of script run, in your case (a scoring script) it will be run for every document matching the query.
If it runs per document, how can I access the document field value? 'The Star Wars' input value can be passed as params and the document field value can be accessed within the plugin.
While this is not in the advanced scoring script example I referenced in my last reply, I will use that as a basis for how to access this data. Within the newInstance method, you would add a line like:
SortedSetDocValues dv = DocValues.getSortedSet(context.reader(), "movieName");
Then inside setDocument(int docid) of the created SearchScript, you would advance the doc values iterator:
dv.advanceExact(docid);
And finally, within the runAsDouble() method, you can access the value of the field:
There are a couple things to note in the last code snippet:
nextOrd() should only be called if advanceExact returned true. My example assumes every document has a value for the "movieName" field, and thus does not check it, but it would be good to assert or verify this.
It would be more efficient to use the raw BytesRef if possible. For example, if you are doing a comparison to another static string, you can construct a BytesRef for that string and call compareTo, rather than encoding the bytes of every movie name into a utf8 string on every query.
private static class DocsRescorer implements Rescorer {
private static final DocsRescorer INSTANCE = new DocsRescorer();
@Override
public TopDocs rescore(TopDocs topDocs, IndexSearcher searcher, RescoreContext rescoreContext) throws IOException {
DocRescoreContext context = (DocRescoreContext) rescoreContext;
int end = Math.min(topDocs.scoreDocs.length, rescoreContext.getWindowSize());
MovieScorer MovieScorer = new MovieScorerBuilder()
.withInputName(context.MovieName)
.build();
for (int i = 0; i < end; i++) {
String name = <get MovieName values from actual document returned by topdocs>
float score = MovieScorer.calculateScore(name);
topDocs.scoreDocs[i].score = score;
}
List<ScoreDoc> scoreDocList = Stream.of(topDocs.scoreDocs).filter((a) -> a.score >= context.threshold).sorted(
(a, b) -> {
if (a.score > b.score) {
return -1;
}
if (a.score < b.score) {
return 1;
}
// Safe because doc ids >= 0
return a.doc - b.doc;
}
).collect(Collectors.toList());
ScoreDoc[] scoreDocs = scoreDocList.toArray(new ScoreDoc[scoreDocList.size()]);
topDocs.scoreDocs = scoreDocs;
return topDocs;
}
@Override
public Explanation explain(int topLevelDocId, IndexSearcher searcher, RescoreContext rescoreContext,
Explanation sourceExplanation) throws IOException {
DocRescoreContext context = (DocRescoreContext) rescoreContext;
// Note that this is inaccurate because it ignores factor field
return Explanation.match(context.factor, "test", singletonList(sourceExplanation));
}
@Override
public void extractTerms(IndexSearcher searcher, RescoreContext rescoreContext, Set<Term> termsSet) {
// Since we don't use queries there are no terms to extract.
}
}
My understanding is that the plugin code will execute once, it will get topdocs as results from the initial query (the fuzzy search in this case) and for (int i = 0; i < end; i++) will loop through each document returned in the result. The place where I need help is:
String name = get MovieName value from actual document returned by topdocs
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.