I'm trying to get an exact count of documents that meet a specific criteria. Aggs queries are somewhat helpful, but I need to be able to return results as well so hoping someone knows a good way to approach this problem. Here are what some of my documents look like:
{
"doc_id": 1,
"doc_type": "foo"
}
{
"doc_id": 1,
"doc_type": "foo"
}
{
"doc_id": 2,
"doc_type": "foo"
}
{
"doc_id": 2,
"doc_type": "bar"
}
The criteria I'm searching for is documents that have the same doc_id
but more than one unique value for doc_type
. In the above example doc_id = 1
would be fine and not picked up by my query, but doc_id = 2
is bad and I need to capture both the doc_id
and that it's 1 instance of a result meeting my criteria. Does anyone know a good method to generate this information quickly? Currently I've got some python code that generates a list of every doc_id
and then searches on them all individually and gets the unique values... but that's not very quick and I have millions of documents. Is there a better way to go about this? I know a cardinality query would work but my understanding is the counts aren't exact and as this spans over multiple shards I'm not sure I can count on those results. Hoping there is a more efficient way than I'm currently approaching the problem to solve this.