Find documents that only have terms intersecting a list of terms but no other terms


(Adam Feuer) #1

I have documents that have a list of labels:

{
   "fields": {
      "label": [
           "foo",
           "bar",
           "baz"
      ],
      "name": [
         "Document One"
      ],
      "description" : "A fine first document",
      "id" : 1
   }
},
{
   "fields": {
      "label": [
           "foo",
           "dog"
      ],
      "name": [
         "Document Two"
      ],
      "description" : "A fine second document",
      "id" : 2
   }
}

I have a list of terms:

[ "foo", "bar", "qux", "zip", "baz"]

I want a query that will return documents that have labels in the list of terms - but no other terms.

So given the list above, the query would return Document One, but not Document Two (because it has the term dog that is not in the list of terms.

I was able to achieve this with a Groovy script (here's the details in a StackOverflow answer).

However, we're interested in using Amazon's new Elasticsearch service offering. This does not allow dynamic scripting.

Is there another way to accomplish this, without using scripting?

cheers
adam

Adam Feuer
CookBrite, Inc.
Seattle, WA, USA


(Isabel Drost-Fromm) #2

None that I know of at the top of my head - other than maybe postponing the "filter documents that contain tags not in the list" to the application side. Maybe others can add more here if I'm missing something obvious.

As an aside - not sure if that's an option for you: https://www.elastic.co/found provides a hosted ES solution as well and does allow for dynamic scripting.

Isabel


(Andrew Barraclough) #3

I had to do something similar recently and ended up creating parent child indexes and then searching on the children (which would give you only the results you want) and you can query or filter based on the parent document to get the results you are after.


(system) #4