Field_stats_api deprecated?

Hi,

About a year ago, lots of people from Elastic were singing the praises of the field_stats_api, specifically how it can be used to figure out which indices to target a search on. Kibana still uses this mechanism in the background. I came to implement something similar myself yesterday, and found now that it has been deprecated and to use the field_capabilites API instead, although this doesn't (as far as I can tell), support that one feature of the field_stats_api that I wanted. It suggested in the documentation that aggregations can be used instead, but in my scenario, I have too many shards to search against. So I'm curious as to how Kibana is going to change in a future release to cope with the loss of the field_stats_api?

Here is the problem:
For a given index pattern 'daily_index-*' we have 30 days of indices. Lets assume that each index has 50 shards, which gives a total of 1500 shards. For a given time range, which could span multiple days, I want to know which indices contain data for that time range. The field_stats_api was able to tell you very quickly, which indices contained data spanning the time range in question (current Kibana implementation).

An aggregated search against the timestamp will fail (unless I change some limits set in Elasticsearch), as it is hitting more that 1000 shards. The field capabilities API doesn't seem to have the same function. So apart from manually figuring out the index names, how can I get ES to efficiently tell me which indices contain the data I need to search against? I'm sure the clever people writing ES and Kibana, have got a new way to efficiently do this and I'm keen to learn how.

Thanks in advance.

We'll be making changes to cater for it.

That (soft) limit was removed in 5.4 :slight_smile:

That I don't know, but I will ask around

@warkolm
Hi, I raised the question as a support ticket, this is the reply I got:

field_stats is deprecated and the replacement is to use a new endpoint called field_caps. Although field_stats was added to solve your problem but in the meantime we improved the handling of range query a lot.
A shard can now very quickly assess if a range query have hits or not and can also cache the result of a range query depending on the min/max values on the shard. For these reasons you should not try to detect which index contain data for the time range but rather let ES do the job. You can just send your query to all indices and the different optimizations for ranges will apply automatically. Kibana is also moving away for field_stats and will also rely on the range optims.
Bottom line is that ES should do a good job with ranges and you should not have to filter the indices on your side.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Also see "Search Scalability" from the release post of Elasticsearch 5.6.0.

As of 5.6.0, searches hitting >= 128 shards are subject to a light pre-filtering phase. Additionally, searches can only run on <max_concurrent_shard_requests || 256> shards concurrently, to help prevent overload.