Distinct List of Hits after Filter


(Craig McNicholas) #1

Hi all, new to ElasticSearch and I have made good progress with a number of queries I am looking to execute. However one seems to elude me.

I have the following document structure.

{
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "The cat sat on the mat"
}

There can exist multiple documents with the same resourceId but never a document with the same attributeId and resourceId (my composite key).

I am currently running this query to get me all results that match a sourceTypeId and attributeId (or number of them) and a search term:

{
  "size": 20,
  "from": 0,
  "query" : {
    "bool": {
      "filter": [ {
        "terms": {
          "sourceTypeId": [ 1000150 ]
        }
      }, {
        "terms": {
          "attributeId": [ 1000697 ]
        }
      } ],
      "must": {
        "fuzzy": { "term": "cat" }
      }
    }
  }
}

However I get duplicate resource id's which is to be expected, how would I go about extending this to return only documents with a distinct resourceId?

I am using AWS ElasticSearch so am locked to version 2.3 if that helps.


(Craig McNicholas) #2

Realized I might need more data for people to help out, so given this example set of documents:

{
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "The cat sat on the mat"
}, {
  "resourceId": 101,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 402,
  "term": "Fat"
}, {
  "resourceId": 102,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 401,
  "term": "Prat"
}, {
  "resourceId": 102,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 402,
  "term": "Double drat!"
}, {
  "resourceId": 103,
  "sourceId": 201,
  "sourceTypeId": 301,
  "attributeId": 404,
  "term": "Nothing fits here"
}

I want to fuzzy query on the field "term" for the text "at" and return the following resource id's:

[ 101, 102 ]

I need to be able to paginate over this data too as there are potentially thousands of distinct resource id's that will match.

So I believe I want some sort of size/from support once the query has been made and the documents have been made distinct. Hope that's clearer?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.