Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.
A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.
i use the above query on twitter data dump to see how many people have use a certain hashtag. Like if i search so for e.g. if I search 'nike' it shows nike,nikiwomen etc and who has used the hastag. Below is the sample result.
now in the above result you can see nike is used by more people and I have displayed only 4 to keep the output light. is there a way that i can paginate the results so when i run the paginate parameters i get the next set of users form 4 to 8
Partitions aren't designed with a global sort order in mind (i.e. the users returned in partition 2 aren't guaranteed to be any more or less popular than those returned in partition 1). Global sort orders on high-cardinality fields like UserIds are hard to reason about in a distributed system where each shard or index has only a small percentage of all the docs.
Partitions are a coping strategy for this problem. By examining arbitrary sub-groupings of terms independently of each other you can attempt to compute things like the top N of something within just that subgroup rather than attempting this analysis across the whole data.
What's the end goal and what business problem are you trying to solve?
I'm unsure what the use is of a sorted-by-popularity list of all users who have ever mentioned #nike.
We can discuss alternative approaches that would support this objective but it's worth understanding if that is really the requirement first
My end goal as i mentioned earlier i need to scroll through aggregated results. I am writing a complex search query where user will type hashtag and he will see all the users who has posted with the hashtag like the #nike example. so the above query is showing me 2 reulults #nike and #nikewomen and the 4 users each now. Now on my website there is a show all button where he can see all the users in #nike so for that reason i need to scroll through the aggregated results its kinda pagination.
"Deep pagination" for an arbitrary query on a distributed system is expensive which is why Google won't let you page beyond a certain number of results for a given query.
If you really need to provide exhaustive results to your end users then you may be forced to reconsider how you physically arrange the data to optimise access for this use case. You may need to pre-aggregate data to keep related information locally e.g. maintain a single document per user with a list of all the hashtags they've ever used and and how frequently they were used.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.