Field collapse - can't sort inner hits to take latest document?


(Thomas Millward Wright) #1

I'm trying to perform a search that returns the most recent document per unique value of field "unique_hash". I'm using a "field collapse" aggregation, like so:

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "filtering_on_this": "value"
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "unique_hash.keyword",
    "inner_hits": {
      "name": "latest",
      "size": 1,
      "sort": {
        "created_at": "asc"
      }
    }
  },
  "size": 30,
  "from": 30
}

As you can see, I am sorting the inner hits on "created_at", using a descending sort. This field is mapped to a "date" type. However, this does not return the latest document for each value of unique hash, and appears to take the earliest instead. Swapping "asc" for "desc" has no effect on the results returned.

Any pointers would be much appreciated!


(Mark Harwood) #2

At first glance - shouldn't sort be an array?


(Thomas Millward Wright) #3

@Mark_Harwood Changing it to an array like:

'sort' => [{'created_at' => 'asc'}]

has no effect :frowning:


(Mark Harwood) #4

Working for me on 6.0 with this repro:

DELETE test
PUT test
{
  "settings": {
	"number_of_replicas": 0,
	"number_of_shards": 1
  },
  "mappings": {
	"doc":{
	  "properties": {
		"filtering_on_this":{
		  "type":"keyword"
		},
		"created_at":{
		  "type":"date",
		  "format":"yyyy-MM-dd"
		},
		"unique_hash":{
		  "type":"text",
		  "fields":{
			"keyword":{
			  "type":"keyword"
			}
		  }
		}
	  }
	}
  }
}
POST test/doc/_bulk
{"index":{}}
{"filtering_on_this":"value","unique_hash":"A","created_at":"2018-01-1"}
{"index":{}}
{"filtering_on_this":"value","unique_hash":"A","created_at":"2018-01-2"}

Search asc request:

GET test/_search
{
  "query": {
	"bool": {
	  "filter": [
		{
		  "term": {
			"filtering_on_this": "value"
		  }
		}
	  ]
	}
  },
  "collapse": {
	"field": "unique_hash.keyword",
	"inner_hits": {
	  "name": "latest",
	  "size": 1,
	  "sort": {
		"created_at": "asc"
	  }
	}
  }
}

Response is single doc dated 2018-01-1
With desc sort response is single doc dated 2018-01-2

Can you provide a reproducible simplified example like the above that demonstrates the error?


(Thomas Millward Wright) #5

Hey @Mark_Harwood. Unfortunately I cannot - I have found the source of my error, and it was not Elasticsearch. I have some minimal ORM that grabbed the wrong part of the response, when I inspected the raw response it became clear that the inner_hits part of the result contained the expected document.

Thanks for your help, sorry for wasting your time!


(Mark Harwood) #6

One of those bugs just waiting for a bigger audience :slight_smile:


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.