ES Java matchAllQuery paging size limit

ajmariella · March 24, 2016, 11:23am

I'm using the java API to do a matchAllQuery() including .setFetchSource for some particular fields to get returned.
I am also using .setFrom and .setSize. WHen I get my initial results back, I get totalHits = 4,9123,345.
But when I try paging through the results, when I get to 10001 I get a NullPointer Error on the response.
Am I doing something wrong ?

dadoonet · March 24, 2016, 11:39am

We added recently a limit for deep pagination.
Is it something you can reproduce? Potentially with a pure CURL script?

If you can reproduce it, please open an issue with that.

If not, can you share here more code? May be you have a NPE in your own code?

BTW, for deep pagination, use scroll API instead.

dadoonet · March 24, 2016, 11:40am

I forgot. A full stacktrace would help a lot.

ajmariella · March 24, 2016, 12:00pm

I added .setScroll and I didn't get the error again. But I keep getting results no matter what page of results I add to the query. Based on the page number, I set .setFrom((page -1) * BULK_MAX_RESULTS) where BULK_MAX_RESULTS = 10000

How do I know when I get to the end of the data ?

dadoonet · March 24, 2016, 1:48pm

Read this: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

ajmariella · March 24, 2016, 2:12pm

I'm writing a web service to retrieve the data, so I can't send all the data at once, I'm relying on the user to input a page number in order to retrieve data. Is there another way to retrieve the data besides scrolling ?

nik9000 · March 24, 2016, 2:30pm

Deep scrolling is always bad. It works in relational databases but poorly. In Elasticsearch it works even more poorly. That is why we don't allow it by default.

One way to work around this is to sort by something consistent and have the user ask for all results after their last one. Like, if you are paging through results by some auto incrementing id then have them ask for documents >= the last one. Or if you are doing date then sort by date, id and have them ask for stuff after the last date, id pair. The advantage of this way of doing things is that you get to use the index to skip all the documents rather than visiting them all. The disadvantage is that you have to sort. If you were sorting any way, well, then no big deal. If not, well, I dunno what is right then.

ajmariella · March 25, 2016, 11:30am

Is there a way with the Java API to only get unique values from a particular field ?

Topic		Replies	Views
Java search API pagination question/issue Elasticsearch	3	2583	July 6, 2017
New Elasticsearch Java client API Elasticsearch	3	1098	July 30, 2022
Elastic Pagination Elasticsearch	5	420	November 29, 2018
Elasticsearch java API client trackTotalHits Elasticsearch language-clients	3	1034	September 27, 2023
Result window is too large, from + size must be less than or equal to: [10000] but was [10050] Elasticsearch	9	25396	April 23, 2018

ES Java matchAllQuery paging size limit

Related topics