ElasticSearch and RestHighLevelClient and fetching more than 10000 items

oreo · April 14, 2020, 11:37am

I'm implementing a REST API using Java RestHighLevelClient. The user specifies the pageNumber and the pageSize of the requested data through the REST parameters.

By default, because of the "max_result_window" limitation in ElasticSearch, I am not able to fetch more than 10'000 items in each query . I also need real time responses, so the "Scroll API" can't be an option for me(or?).

So based on what I've searched so far, search_after should be the solution for me. but on the other hand as I learnt, in order to use search_after, we should have a scenario like having 'Previous' and 'Next' buttons. While in the scenario I'm dealing with, there is no such buttons, instead the user specifically mentions the pageSize and PageNumber.

My questions:

Which searching solutions(Scroll API, search_after, sliced_scrolling, etc.) is the best solution for me?
I also need a good example of how to implement search_after in Java when using RestHighLevelClient. I've spent so much time on finding an example which explains how to implement search_after, but unfortunately didn't find any.
Can search_after be used when the user specifies the pageSize and the pageNumber? if yes, how should it be implemented? If no, what should I do then?

I appreciate if someone shed a light on this matter, Thanks!

kolle1986 · May 8, 2020, 11:15pm

I'm looking for exactly the same thing? Are you managed to do this somehow?
I can't believe it is so hard to achieve this basic thing. I have table (Vaadin grid) which is scrolled randomly and offset and limit are passed in java backend to elasticsearch service. It is logs table that holds millions of rows and for example user can pick scroll and go to offset 100k with limit of some 20-30 rows. So I need random offset and limit every time.

dadoonet · May 9, 2020, 3:39am

IMHO it has a little sense to propose such a thing to the end user.
When you are searching on Google, do you often go to page 265? I guess no.

It's better to propose the user a very good sort experience Instead of leaving him going through tons of pages to find the relevant information.

My 2 cents.

kolle1986 · May 9, 2020, 11:35am

@dadoonet Thank you for your answer, I'm new in Elasticsearch and some things are not yet so clear to me.

Alright, your answer make sense, but the thing is, this project is at production stage. We use mysql and it turned out that searching over 70 millions logs is now very difficult, almost impossible, especially because the data is always persisted and read from that same table.

Because of that we must change logs backend mechanism and it seems like elasticsearch is perfect for it. It is Java web application (with Vaadin framework in front). Our logs page use lazy loading grid (table) and because of that we cannot control how deep user will pull scroller down and request big offset if previously has selected some older date from criteria ( for 10k+ rows just one or two days is enough which is not so rare to do anyway). Our front component and data provider in it sends to backend new offset and limit every time user scroll search results for loading more.

In your answer you are suggesting that we definitely need to switch to pager. That needs some refactor and frontend user interface changes. So is it somehow possible to use that search_after param or even scroll api to achieve this without switching to pager? If not is there some complete example using RestHighLevelClient and pager?

Thank you very much once again!

dadoonet · May 10, 2020, 3:15pm

May be. I don't know what pager is.
About search after, you have some examples here which might help.

github.com

elastic/elasticsearch/blob/f4223b6a8fa74918a1831c12b6cceab7f5d237f9/server/src/test/java/org/elasticsearch/search/searchafter/SearchAfterBuilderTests.java#L57


      
          import java.util.Collections;
          
          import static org.elasticsearch.search.searchafter.SearchAfterBuilder.extractSortType;
          import static org.elasticsearch.test.EqualsHashCodeTestUtils.checkEqualsAndHashCode;
          import static org.hamcrest.Matchers.containsString;
          import static org.hamcrest.Matchers.equalTo;
          
          public class SearchAfterBuilderTests extends ESTestCase {
              private static final int NUMBER_OF_TESTBUILDERS = 20;
          
              private static SearchAfterBuilder randomSearchAfterBuilder() throws IOException {
                  int numSearchFrom = randomIntBetween(1, 10);
                  SearchAfterBuilder searchAfterBuilder = new SearchAfterBuilder();
                  Object[] values = new Object[numSearchFrom];
                  for (int i = 0; i < numSearchFrom; i++) {
                      int branch = randomInt(9);
                      switch (branch) {
                          case 0:
                              values[i] = randomInt();
                              break;
                          case 1:

kolle1986 · May 11, 2020, 11:08am

@dadoonet Thank you once again, I'll try this. Btw by "pager" I mean all paging mechanism backend and controls on interface (paging nav buttons . e.g. 1, 2,3... 99 >).

system · June 8, 2020, 11:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pagination with HighLevelRestClient Elasticsearch language-clients	9	1965	February 23, 2021
How to get data more than 10000 in elasticsearch Elasticsearch	27	21336	January 17, 2018
Elasticsearch pagination approach help Elasticsearch	1	333	March 26, 2020
Implementing Search After to fetch hits greater than 10000 Elasticsearch	2	1199	August 15, 2022
Problems with terminateAfter Java API Elasticsearch	1	320	April 2, 2020

ElasticSearch and RestHighLevelClient and fetching more than 10000 items

Related topics