Rescorer plugin. How to parse inner hits with original scores

I create a query to ES:

GET my-index/_search
{
      "query": {
        "nested": {
          "inner_hits": {},
          "score_mode": "max",
          "path": "my_nested_field",
          "query": {
            "bool": {
              "should": [
                {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "my_nested_field.value.token_analyzed": {
                            "query": "Looking for something like this"
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      "rescore": {
        "my_plugin_name": {
        }
      }
    }

Documents in index are something like:

{
	"some_field": "some_value",
	"some_other_field": "some_other_value",
	"my_nested_field": [
		{
			"value": "some nested value",
			"something_else": "something else"
		},
		{
			"value": "some nested value 2",
			"something_else": "something else 2"
		}
	]
	]
}

My custom rescorer plugin is executed and everything is good. I would like to optimize my plugin though. Currently when I hit some document I use every element in my_nested_field to rescore the top level document. I would like to use only the ones that actually caused the hit for rescoring the top level document. But I don't know how to filter out the ones that did not cause the hit in the plugin.

My current code:

public TopDocs rescore(TopDocs topDocs, IndexSearcher searcher, RescoreContext rescoreContext) throws IOException {
	for (int i = 0; i < topDocs.scoreDocs.length; i++) {
		Document document = searcher.doc(topDocs.scoreDocs[i].doc);
        String json = parserSource(document);
	}
  ...

private String parseSource(Document document) {
  return new String(document.getField("_source").binaryValue().bytes, StandardCharsets.UTF_8);
}

The thing that I'm looking for is not in the path _source, but the only things I can parse like this are _source and _id. I expect it's because you can only parse stored fields. But surely there must be somehow I can parse the inner hits scoring results?

In the actual ES response right next to each documents source there is this (but I dont know how to parse this stuff in plugin):

"inner_hits": {
    "my_nested_field": {
        "hits": {
            "total": {
                "value": 1,
                "relation": "eq"
            },
            "max_score": 4.2184687,
            "hits": [ // I NEED THIS STUFF NOT THE _source
                {
                    "_index": "my-index",
                    "_type": "_doc",
                    "_id": "8b3d929a-8e90-4ce7-aa1e-7f11ec16de1e",
                    "_nested": {
                        "field": "my_nested_field",
                        "offset": 2
                    },
                    "_score": 4.2184687,
                    "_source": {
                        "value": "Some value which was actually hit",
                    }
                }
            ]
        }
    }
}

Side note: I need the full document after I make the query, not just the nested fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.