Getting unique results on highlight or multiple fields

My indices contain various analyses of executable files.

DELETE static_index,strings_index

PUT static_index/_doc/1
{ "md5": "85e9e2c4a9c0a7af309c906516aa4548",
  "static_api": "ShellExecuteW" }

PUT static_index/_doc/2
{ "md5": "85e9e2c4a9c0a7af309c906516aa4548",
  "static_api": "ShellExecuteW" }

PUT strings_index/_doc/3
{ "md5": "85e9e2c4a9c0a7af309c906516aa4548",
  "strings_api": "ShellExecuteA" }

How can I modify this query...

GET *_index/_search
{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "*Execute*",
          "analyze_wildcard": true
        }}}},
  "highlight": {
    "fields": {
      "*.keyword": {}
    }}}

to get:

  1. the first match per unique highlight?
  2. the first match per unique (md5 AND index)?

Desired reply:

"hits" : [
  { "_index" : "static_index",
	"_source" : {
	  "md5" : "85e9e2c4a9c0a7af309c906516aa4548",
	  "static_api" : "ShellExecuteW" },
	"highlight" : {
	  "static_api.keyword" : [
		"<em>ShellExecuteW</em>"
	  ]}},
  { "_index" : "strings_index",
	"_source" : {
	  "md5" : "85e9e2c4a9c0a7af309c906516aa4548",
	  "strings_api" : "ShellExecuteA" },
	"highlight" : {
	  "strings_api.keyword" : [
		"<em>ShellExecuteA</em>"
	  ]}}
]

First based on what doc value though?

For my use case, it could be based on any doc value (i.e. sorting doesn't matter). If aggregation/bucketing could produce the desired hits, any match from the highlight or (md5 AND index) bucket would do.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.