Filter results by another query

I have a usecase where I want to search for documents that do not have counterparts in the index.

For example, consider the following:

I have a document format like this:

{
    "file": "myfile.ext",
    "classification": "class-1",
    "source": "source-1"
}

I might have multiple with different sources and different classifications:

{
    "file": "myfile.ext",
    "classification": "class-2",
    "source": "source-2"
}

I want to do a query that brings back only documents where there isn't a counterpart that has a classification.

eg, given the following data:

[
   {
       "file": "file-1",
       "classification": "class-2",
       "source": "source-2"
   },
   {
       "file": "file-1",
       "source": "source-5"
   },
   {
       "file": "file-2",
       "source": "source-3"
   }
]

only the last document (file-2) would be returned.

is this even possible in elasticsearch?

Initially I thought about using field collapsing on the file property and somehow filtering results where there were inner hits that had the classification field, but I'm not sure that's possible.

The next option might be to use an agg that applied filters with a sub agg that showed top hits or similar? But I'm not sure if that's even possible.

Any help/advice appreciated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.