Access to documents in ScriptEngine plugin

Cameron_VandenBerg · July 2, 2018, 9:08pm

Is it possible to access document fields or termvectors within a plugin that implements ScriptEngine? I have a field for each document which contains the length of the document, and I would like to use that value as well as tf to compute a score. I am following this example for writing a plugin: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-engine.html. However, all my attempts to access the actual document come back null.

rjernst · July 2, 2018, 9:21pm

Can you share any of your code?

Cameron_VandenBerg · July 3, 2018, 3:03pm

Here is one example of what I have tried. This is a part of my class which implements ScriptEngine just like in the example. The problem is that the Document, which is returned from the reader (the LeafReader from the context) is null. Is there any way to get a handle on the document within the ScriptEngine context?

'''
@Override
public SearchScript newInstance(LeafReaderContext context) throws IOException {
LeafReader reader = context.reader();
PostingsEnum postings = context.reader().postings(new Term(field, term));

				return new SearchScript(p, lookup, context) {
					int currentDocid = -1;

					@Override
					public void setDocument(int docid) {
						if (postings != null) {
							// advance has undefined behavior calling with a
							// docid <= its current docid
							if (postings.docID() < docid) {
								try {
									postings.advance(docid);
								} catch (IOException e) {
									throw new UncheckedIOException(e);
								}
							}
						}
						currentDocid = docid;
					}

					@Override
					public double runAsDouble() {
						try {
							double mleScore = 0f;
							Document doc = reader.document(currentDocid);
							System.out.println("Document: " + doc);
							double doclen = Double.valueOf(doc.getField("body_len").stringValue()).doubleValue();
							System.out.println("Document length: " + doclen);
							if (postings != null) {
								mleScore = postings.freq() / doclen;
							}

							return mleScore;
						} catch (IOException e) {
							throw new UncheckedIOException(e);
						}
					}
				};
			}

'''

rjernst · July 3, 2018, 4:27pm

If you want doc values, the Document is not what you want (that is for stored fields access). You need to get an appropriate doc values instance for the type of data. You can do this using the helper DocValues class from Lucene. For example, to get numeric doc values for a field called "mynum", you would do this next to the postings declaration:

SortedNumericDocValues mynumValues = DocValues.getSortedNumeric(reader, "mynum");
boolean hasMynumValue;

Then in setDocument:

hasMynumValue = mynumValues.advanceExact(docid);

And finally in the scoring function you can use the docvalue iterator to extract the values for the current document by calling mynumValues.nextValue() for each value the doc has (you can find how many values to expect with mynumValues.docValuesCount().

Cameron_VandenBerg · July 3, 2018, 7:52pm

Thank you. If the termvector is stored for a field, is there a way to access that in the ScriptEngine as well?

rjernst · July 10, 2018, 6:46am

Yes, you could access term vectors, but you will need to read Lucene documentation to learn how to access. An advanced script implemented through a ScriptEngine has a LeafReader, which is an IndexReader, and has getTermVector.

system · August 7, 2018, 6:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advanced score ScriptPlugin Elasticsearch	3	512	November 22, 2019
How to access field value with LeafReaderContext in ScriptEngine Elasticsearch	2	731	April 6, 2019
How to access doc values from expert script plugin Elasticsearch	2	1168	December 5, 2017
Cannot access doc values after forcemerging Elasticsearch	3	425	December 9, 2019
Access doc fields value in script-expert-scoring Elasticsearch	1	392	April 4, 2019

Access to documents in ScriptEngine plugin

Related topics