Using Painless in the filter potion of my search (limiting results by a recurring field in the document being greater then an average)

Hello, I have an interesting search problem, and my hope is that maybe someone here has encountered something similar. First a bit of background: I'm currently writing query DSL to search old phone books that are loaded into Elasticsearch. They are loaded in a manner similar to XML, but the documents are not true XML, as the phone book pages have been flattened and transformed, additional fields have been added to the documents, and at the top of each document are multiple fields of metadata describing each phone book page. On each document in the index there are a group of fields in LineStatistics that include some stats about each page of the phone book, including LineHeightAverage. This is visible in the image below, which is the metadata part of the document. Note: Each page in the phone book represents a document in the index.

LineStatsImage

This LineHeightAverage field is created by taking the average LineHeight of each line in the phonebook. Since each page in the phonebook is a document in the index, a document contains many lines of varying line heights, an example of which can be seen below.

My goal is to add a feature in my existing query where it boosts the results given back to the user if the Columns.Line.LineHeight is above the the average line height for the document/page. This is important because the user is looking for street names, and the street names in these phone books are always capitalized and larger then the surrounding text on the page.

I have thought about using an average for the entire book, but I believe this would give subpar results. I am also trying to avoid a query that involves two round trips to Elasticsearch. My end goal is to significantly boost the results that are found where the LineHeight for the related LineText field is greater then the LineHeightAverage for the document: a page in the phonebook. Please let me know if you have any questions, I can elaborate if needed. Below is a sample of the current working query.

Bump in the hopes someone sees this

Looking at using painless now in the filter portion of my search-- possibly changing the title to reflect this.

I solved this problem by adding a short painless script to my query, in a bool must section I put the following:

{
"script": {
"script": {
"source": "doc['Columns.Lines.LineHeight'].value > doc['LineStatistics.LineHeightAverage'].value",
"lang": "painless"
}
}
}

Now I'm just figuring out how to boost it, as it doesn't seem to have that functionality native :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.