I have a usecase where I want to search for documents that do not have counterparts in the index.
For example, consider the following:
I have a document format like this:
{
"file": "myfile.ext",
"classification": "class-1",
"source": "source-1"
}
I might have multiple with different sources and different classifications:
{
"file": "myfile.ext",
"classification": "class-2",
"source": "source-2"
}
I want to do a query that brings back only documents where there isn't a counterpart that has a classification
.
eg, given the following data:
[
{
"file": "file-1",
"classification": "class-2",
"source": "source-2"
},
{
"file": "file-1",
"source": "source-5"
},
{
"file": "file-2",
"source": "source-3"
}
]
only the last document (file-2
) would be returned.
is this even possible in elasticsearch?
Initially I thought about using field collapsing on the file
property and somehow filtering results where there were inner hits that had the classification
field, but I'm not sure that's possible.
The next option might be to use an agg that applied filters with a sub agg that showed top hits or similar? But I'm not sure if that's even possible.
Any help/advice appreciated.