Hey,
I've been a long-time user of the _field_stats API for index filtering. I have two production systems in place where queries typically filter on some category identifier (namely an ID, in the form of an integer) and date.
However-long-ago, I bumped up to 6.x, and consequently was no longer able to use that API to pre-filter indices. Since then, I've been unimpressed with these queries' performance. These systems have evolved over time, so I can't necessarily blame the API removal in whole, but I'd like to understand the alternative that was put in place.
My indices generally take the form of, for example: index-name-{CategoryId}-{yyyyMM}.
When I run a term on the field corresponding to {CategoryId}, and a range on {yyyyMM}, I sometimes get back a _shards element with a non-zero "skipped." Is that coming from the pre-filtering that's done at query time?
Similarly, I often find that there are no skipped shards, even when I know there should be.
I've had to go back to a most-unfortunate workaround of late, wherein I expand out index names, especially for {CategoryId}. This improves query time substantially; more than the few ticks the _index_stats would have cost me.
Is there a way for me to troubleshoot this pre-filter phase? Does it apply only to range queries, and not term/terms ones?
I've tried running the profile option, and tons of indices show up. But I don't know whether it's truly damning for an index to show up in that, or if pre-filter includes it there. They don't look like they're being filtered out--they have all the usual nodes that an index with no match would have. But then I'm not sure.
So yeah, basically just looking for docs (I haven't found any reference to the "skipped" response field at all), or anecdotal suggestions, or anything that can help me understand which shards are skipped and why, and which aren't and why.
I'm definitely a proponent of the engine being intelligent enough to do this, I just need some more visibility into why it might not be working for me.
Thanks,
Matthew