FieldStats support

Hello
FieldStats is removed in ES 6.0. Wanted to know if the functionality is provided by any other API or combination of APIs.
We have been using it to filter numbers of indices to search which store time series data. We have been using the FieldStatsRequestBuilder with constraints to find the indices.

We have also been using FieldStats to get min and max values stored in a field.
Is there a way to get these in absence of FieldStats?

regards
Gopal

Hey,

I assume you filtered down, because you did not want to spread your search across a wide number of shards? Elasticsearch added safeguards against this (running several rounds), so this should not be a concern. See https://github.com/elastic/elasticsearch/pull/25658

could min/max values just become an aggregation in your case?

--Alex

1 Like

Thanks. The min/max aggregation should work for us. Just wondering how field_stats vs min/max aggregation compare on performance?

For the safeguard change : "This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the pre_filter_shard_size threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all." It is not entirely clear to me from the PR how this works, if you can give more details about how pre_filter uses the query and which fields from the query it may use to filter out, it will help to understand how this works.

Thanks
Gopal

It should actually be more performant, as doing min/max only are fewer calculations than field_stats was doing.

1 Like

The magic happens in SearchService.canMatch()

ES is able to rewrite queries internally, some of those queries for example get rewritten to a MatchNoneQuery and thus can be fully ignored.

I haven't see min/max aggregation as being more performant than _field_stats in 5.6. In my cluster, if I do a _field_stats for one field, I can get it back for all indices in usually less than a second. If I do a min/max aggregation for the same field with an _index aggregation (so I can get the same data as _field_stats), it takes >10 seconds.

@spinscale @theuntergeek Can you please comment on the performance comment made here by @trevan . Why the aggregations may be slower?

@gphadke, @trevan that can only happen in field_stats if the results have been cached in memory. That's why it can return that quickly. If the results are not cached—for example, you just restarted each node in the cluster, so nothing is in memory—then the field_statsquery will take more time, too, as it has to go to the indices for the results (and then cache them).

Aggregations using doc_values will also cache, but in the filesystem cache, rather than the JVM. Performance will depend on the operating system's abilities at that point, and whether those values stay in the cache.

@theuntergeek, would _cache/clear have the same affect on field_stats? Because I've done that and it is still 1-2 seconds at most. The only time I've ever seen field_stats and an equivalent aggregation have the same speed is when the equivalent aggregation is in the query cache which never seems to happen under real-world conditions since the timespan is always changing. Otherwise, field_stats, with and without caching, has always been extremely faster than an aggregation.

That API only clears filter cache, if I recall correctly. field_stats is safe to use within constraints, but can balloon memory out of control. Use it with caution.

You could however use the shard request cache to cache the aggregation results.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.