I'm doing an elasticsearch api query search using the php client and I tried with sorting along with from/size. I'm using from/size so I can do pagination when I show the data but it seems the sorting not really work when I start to go to other pages. It seems like it's not really doing the sorting first before getting the range of data (from/size).
Anyone know how to do it similar like what you do in mysql where you just do order by and limit 0,100, etc?
I'm using from and size when I do the search api so I can do pagination. Example: I've a list of members and contains a country field. There're like 200 records. So if I do from=0 and size=25 then I'm on the 1st page and total of 8 pages. I can go to different pages by increment the from parameter. But as soon I include the sort param on country and order desc then you will expect to see records with country=ZW on the first page while last page will have country=A1. But it seems when I go to the last page or anyother pages with country desc then it still show ZW.
How can I make it works like mysql where I can set order by country and limit 0,25 and it will actually sort it first before doing the limit?
It seems in elasticsearch it does the from/size first then do the sorting?
Which version of elasticsearch is it? How is the field country mapped? When you get back results, each hit should contain an array called sort. These are the values that are used for sorting. What values do you see there?
Hi Sorry, actually I passed the incorrect "from" so it's always show the same country. But I found another issue. I'm using elasticsearch 2.3.1 and php client 2.0. This is the exception I got as I tried to go to the last page 50851. It seems the search api can't handle more than 10000 records? Is that mean it can't return more than that? And I need to use scroll api?
{"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [1271275]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"daily_summary_mobile_geo-2017","node":"lyb1R3fjQA-_MERcQw4upA","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [1271275]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."}}]},"status":500}
Please read the section about the deep pagination in your guide. It explains why we decided to prevent users from accidentally paginate through too many pages.
The search API can handle more then 10000 records, but if your users feel compelled to page through thousands of pages it might be a good indication of some problems with the search UI that doesn't allow them to quickly narrow down their searches and sort the results in such a way that they can quickly get to a more manageable subset of results. For example, the search UI might be missing faceted navigation by country or ability to sort in reverse alphabetical order and so on.
So, because deep pagination is very expensive, can inadvertently happen when crawler bots find your system and is not very useful for end users, we have disabled it by default. This is a somewhat common approach in search engines. For example, try paging through this search https://www.google.com/search?q=search and try to get to the page 100.
I see. Just want to check about the scroll api though. I'm looking at this and it seems not as easy as the search api. It also seems to go through some kind of loop? Can the scroll api help in my case if I still want to show the last page? It seems it can only specific the size and it seems it will loop through scroll id. Can it work with sorting in scroll api and with from/size?
Scroll api is typically used without sorting because it is used when you need all pages. If you need only one page there is absolutely no reason to use scroll. There is no magic there and deep sorting is deep sorting and it will be the same in case of scroll. With scroll through multiple pages you would only save some time because you don't have to repeat this for every page that you get back. But if you get back only one page - you are not saving anything.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.