FieldStats support

gphadke · November 10, 2017, 7:23am

Hello
FieldStats is removed in ES 6.0. Wanted to know if the functionality is provided by any other API or combination of APIs.
We have been using it to filter numbers of indices to search which store time series data. We have been using the FieldStatsRequestBuilder with constraints to find the indices.

We have also been using FieldStats to get min and max values stored in a field.
Is there a way to get these in absence of FieldStats?

regards
Gopal

spinscale · November 10, 2017, 9:38am

Hey,

I assume you filtered down, because you did not want to spread your search across a wide number of shards? Elasticsearch added safeguards against this (running several rounds), so this should not be a concern. See https://github.com/elastic/elasticsearch/pull/25658

could min/max values just become an aggregation in your case?

--Alex

gphadke · November 14, 2017, 6:40pm

Thanks. The min/max aggregation should work for us. Just wondering how field_stats vs min/max aggregation compare on performance?

For the safeguard change : "This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the pre_filter_shard_size threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all." It is not entirely clear to me from the PR how this works, if you can give more details about how pre_filter uses the query and which fields from the query it may use to filter out, it will help to understand how this works.

Thanks
Gopal

theuntergeek · November 14, 2017, 6:57pm

It should actually be more performant, as doing min/max only are fewer calculations than field_stats was doing.

spinscale · November 15, 2017, 9:40am

The magic happens in SearchService.canMatch()

ES is able to rewrite queries internally, some of those queries for example get rewritten to a MatchNoneQuery and thus can be fully ignored.

trevan · November 17, 2017, 6:29pm

I haven't see min/max aggregation as being more performant than _field_stats in 5.6. In my cluster, if I do a _field_stats for one field, I can get it back for all indices in usually less than a second. If I do a min/max aggregation for the same field with an _index aggregation (so I can get the same data as _field_stats), it takes >10 seconds.

gphadke · December 5, 2017, 8:27am

@spinscale @theuntergeek Can you please comment on the performance comment made here by @trevan . Why the aggregations may be slower?

theuntergeek · December 5, 2017, 1:03pm

@gphadke, @trevan that can only happen in field_stats if the results have been cached in memory. That's why it can return that quickly. If the results are not cached—for example, you just restarted each node in the cluster, so nothing is in memory—then the field_statsquery will take more time, too, as it has to go to the indices for the results (and then cache them).

Aggregations using doc_values will also cache, but in the filesystem cache, rather than the JVM. Performance will depend on the operating system's abilities at that point, and whether those values stay in the cache.

trevan · December 5, 2017, 3:45pm

@theuntergeek, would _cache/clear have the same affect on field_stats? Because I've done that and it is still 1-2 seconds at most. The only time I've ever seen field_stats and an equivalent aggregation have the same speed is when the equivalent aggregation is in the query cache which never seems to happen under real-world conditions since the timespan is always changing. Otherwise, field_stats, with and without caching, has always been extremely faster than an aggregation.

theuntergeek · December 5, 2017, 4:07pm

That API only clears filter cache, if I recall correctly. field_stats is safe to use within constraints, but can balloon memory out of control. Use it with caution.

spinscale · December 6, 2017, 8:51am

You could however use the shard request cache to cache the aggregation results.

system · January 3, 2018, 8:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Field stats search in ES 6.X? Elasticsearch	4	586	March 31, 2018
How to retrieve field statistics now that _field_stats is deprecated Elasticsearch	3	883	November 23, 2017
Field_stats_api deprecated? Elasticsearch	4	1102	October 30, 2017
Understanding "skipped" shards Elasticsearch	1	3410	August 21, 2018
Can I use index stats to measure my application performance? Elasticsearch	1	386	July 5, 2017

FieldStats support

Related topics