Scrolling and Paging in Elastic/Nest 6+

Hi guys,

Intro

I'm upgrading the elastic version to 6.3 (previously we were using 5.4.
Our app is written in C#, thus we use NEST.NET dll to talk with the Elastic server, so we are also updating it to the version 6.0.0.0.

The use case - Before

Until version 5, I was able to execute this query:

jsonStr = 
   {
      "from": 16224,
      "size": 12,
      "query": {
        "bool": {
          "filter": [
            {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "COMPANY": [
                        "AMP Services Ltd"
                      ]
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }

Using this NEST/C# code:

Func<SearchRequestParameters, SearchRequestParameters> requestParameters = null;
requestParameters = a => a.Scroll(new TimeSpan(0, 1, 0));
response = Connection.Client.GetInstance().LowLevel.Search<dynamic>("myindex", new PostData<dynamic>(jsonStr), requestParameters);

And with that, I was able to fetch the data without problems,

The use case - NOW

Now, with version 6, I'm trying to execute this very same query:

jsonStr =
{
  "from": 16224,
  "size": 12,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must": [
              {
                "terms": {
                  "COMPANY": [
                    "AMP Services Ltd"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Using this NEST/C# code (as the previus method signatures are no longer available):

SearchRequestParameters searchRequest = new SearchRequestParameters();
searchRequest.Scroll = new TimeSpan(0, 1, 0);
response = Connection.Client.GetInstance().LowLevel.Search<StringResponse>("myindex", PostData.String(jsonStr), searchRequest);

And I'm getting this error: "Validation Failed: 1: using [from] is not allowed in a scroll context;"

Documentation

I could not find anything in here (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) and here (https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/sliced-scroll-search-usage.html) to help me replace this logic. Nothing in the forums either.

Do you guys have any insights on this?

Thanks

It looks to be related to a validation change in Elasticsearch in 6.0.0; In 5.x, the from parameter was allowed for a scroll request but silently ignored. Now in 6.0.0, Elasticsearch is stricter and validates whether from is present for a scroll request and if it is, returns a bad response with an appropriate validation error.

Since a from parameter doesn't make sense for a scroll request, the solution to this is to do one of these two

  1. Remove the from parameter when using the Scroll API
  2. Continue to use the from parameter but do not use the Scroll API.

As an aside, If you are needing to scroll many documents, you may want to use ScrollAll() observable helper to do so.

Hi @forloop

Thanks for taking time replying it. But I'm afraid that none of the options you suggested will work. Most of them I actually already gave a shot:

"Since a from parameter doesn't make sense for a scroll request, the solution to this is to do one of these two"
Answer: This is a new limitation of Elastic or a downgrade. It works like a charm in the previous version of Elastic/NEST.

1. Remove the from parameter when using the Scroll API
Answer: That doesn't solve the problem. The use case is precisely the paging capability.

2. Continue to use the from parameter but do not use the Scroll API.
Answer: I've tried it. Ironically enough, the response from Elastic tells me to use it: _"Result window is too large, from + size must be less than or equal to: [10000] but was [16236]. See the scroll api for a more efficient way to request large data sets."

"As an aside, If you are needing to scroll many documents, you may want to use ScrollAll() observable helper to do so."
The LowLevel client doen't have this option.

This is not a new limitation in Elasticsearch - Elasticsearch 5.x was more lenient in allowing the from parameter to be present, whereas Elasticsearch 6.x is now stricter and fails validation when it is present. The presence of the parameter in 5.x had no effect on the scroll request. Now in 6.x, Elasticsearch is trying to be more helpful and instead of silently ignoring the parameter, tell you that it shouldn't be present.

If you're paginating with from and size, then you must not specify a scroll timeout; this timeout is used for keeping the scroll cursor alive on the cluster. The presence of the scroll timeout signals to Elasticsearch that the request should initiate a scroll request, which is not the case if you'd like to paginate with from and size.

That isn't what the error indicates; the error is indicating that you're attempting to perform deep pagination using from and to, and that the value of from + size is larger than the index.max_result_window index setting which by default is 10000. This setting is in place to safeguard against deep pagination.

You can increase the size of this setting if you want, but the error is suggesting that if you're wanting to do deep pagination, using the scroll API may be more efficient. When using the scroll API, you won't specify a from parameter and instead will scroll results from Elasticsearch using the scroll ID passed back on each response.

You may want to use it as the basis for an implementation using the low level client. It's provided to make it easier to scroll large document sets efficiently :slightly_smiling_face:

Hi @forloop

Thanks again for the response, really appreciate the support.

So our problem here seems to be:

  • Since the version v1 of Elastic/NEST that's the approach we've been using do deep pagination. From version 6, we no longer have this option. The method's signature has changed, not giving us am an alternative.

  • We can't use from / size for the Scroll API anymore.

  • Increase the max_result_window won't solve the problem. Rather it gets the Elastic server frozen.

We've been using Elastic as our backbone for the past 3 years, and it seems to be the first time we have a dramatic difference for a given feature. I guess my only option is to pivot my question:

...Since any previous approach is no longer possible,
...in a .NET application, that produces raw json queries,
...and issue these queries via LowLevel client...

How can we get a page where from + size is larger than 1000?

I think you're misunderstanding @Paulo_Silva

Nothing has been removed in Elasticsearch 6.x. You can still perform scroll requests or perform pagination, including deep pagination. Although we recommend not to do deep pagination and have put in a soft configurable limit on the result window size of 10000, you can still change this value to something higher to do deep pagination if you want to.

You can use size as you always have done. You can't use from, and in fact it has had no effect since Elasticsearch 1.2. Prior to 1.2, it is unclear what it would be used for as discussed on the issue. As already indicated, if you were specifying from in your scroll requests in 5.x, it had no effect on the request. To allow requests to continue in 6.x as they did in 5.x, you should only need to remove the from parameter.

This is what the index.max_result_window index setting tries to avoid and why deep pagination is a bad idea!

You mean 10000? You can look to increase the value for index.max_result_window index setting, but it may put massive load on the server for a deep page as you've seen. The better option is to

  • Use the scroll API as previously mentioned. You will deal with scroll responses here as opposed to pages
  • Use search_after if you have the values for the sort options of the last document in the previous response.

Hope that helps :+1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.