ES Java matchAllQuery paging size limit


#1

I'm using the java API to do a matchAllQuery() including .setFetchSource for some particular fields to get returned.
I am also using .setFrom and .setSize. WHen I get my initial results back, I get totalHits = 4,9123,345.
But when I try paging through the results, when I get to 10001 I get a NullPointer Error on the response.
Am I doing something wrong ?


(David Pilato) #2

We added recently a limit for deep pagination.
Is it something you can reproduce? Potentially with a pure CURL script?

If you can reproduce it, please open an issue with that.

If not, can you share here more code? May be you have a NPE in your own code?

BTW, for deep pagination, use scroll API instead.


(David Pilato) #3

I forgot. A full stacktrace would help a lot.


#4

I added .setScroll and I didn't get the error again. But I keep getting results no matter what page of results I add to the query. Based on the page number, I set .setFrom((page -1) * BULK_MAX_RESULTS) where BULK_MAX_RESULTS = 10000

How do I know when I get to the end of the data ?


(David Pilato) #5

Read this: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html


#6

I'm writing a web service to retrieve the data, so I can't send all the data at once, I'm relying on the user to input a page number in order to retrieve data. Is there another way to retrieve the data besides scrolling ?


(Nik Everett) #7

Deep scrolling is always bad. It works in relational databases but poorly. In Elasticsearch it works even more poorly. That is why we don't allow it by default.

One way to work around this is to sort by something consistent and have the user ask for all results after their last one. Like, if you are paging through results by some auto incrementing id then have them ask for documents >= the last one. Or if you are doing date then sort by date, id and have them ask for stuff after the last date, id pair. The advantage of this way of doing things is that you get to use the index to skip all the documents rather than visiting them all. The disadvantage is that you have to sort. If you were sorting any way, well, then no big deal. If not, well, I dunno what is right then.


#8

Is there a way with the Java API to only get unique values from a particular field ?


(system) #9