Distinct List of Hits after Filter

Hi all, new to ElasticSearch and I have made good progress with a number of queries I am looking to execute. However one seems to elude me.

I have the following document structure.

{
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "The cat sat on the mat"
}

There can exist multiple documents with the same resourceId but never a document with the same attributeId and resourceId (my composite key).

I am currently running this query to get me all results that match a sourceTypeId and attributeId (or number of them) and a search term:

{
  "size": 20,
  "from": 0,
  "query" : {
    "bool": {
      "filter": [ {
        "terms": {
          "sourceTypeId": [ 1000150 ]
        }
      }, {
        "terms": {
          "attributeId": [ 1000697 ]
        }
      } ],
      "must": {
        "fuzzy": { "term": "cat" }
      }
    }
  }
}

However I get duplicate resource id's which is to be expected, how would I go about extending this to return only documents with a distinct resourceId?

I am using AWS ElasticSearch so am locked to version 2.3 if that helps.

Realized I might need more data for people to help out, so given this example set of documents:

{
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "The cat sat on the mat"
}, {
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 402,
  "term": "Fat"
}, {
  "resourceId": 102,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "Prat"
}, {
  "resourceId": 102,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 402,
  "term": "Double drat!"
}, {
  "resourceId": 103,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 404,
  "term": "Nothing fits here"
}

I want to fuzzy query on the field "term" for the text "at" and return the following resource id's:

[ 101, 102 ]

I need to be able to paginate over this data too as there are potentially thousands of distinct resource id's that will match.

So I believe I want some sort of size/from support once the query has been made and the documents have been made distinct. Hope that's clearer?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.