Term aggrigation with partition not working

tushar_chevulkar · March 12, 2018, 11:17am

when I am running the below query the partition is not working the query executes but the partition filter dose not work.

I really don't know where am I going wrong. Can any one help me with this ?

{
	"query": {
		"bool": {
		  "must": [
		      {
		        "match_phrase_prefix": {
          			"mentions": {
          				"query": "ABC",
          				"max_expansions": 10
          			}
          		}
		      }
		  ]
		}
	},
	"aggs": {
					"genres": {
					  "terms": {
							"field": "mentions",
							"size": 1
  					},
						"aggs": {
							"user_part": {
								"terms": {
									"field": "user.username",
									"size":4,
        						"include": {
                         "partition": 1,
                         "num_partitions": 5
                      }
								}
							}
						}
					}
				}
}

dadoonet · March 12, 2018, 12:17pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

tushar_chevulkar · March 12, 2018, 12:37pm

i use the above query on twitter data dump to see how many people have use a certain hashtag. Like if i search so for e.g. if I search 'nike' it shows nike,nikiwomen etc and who has used the hastag. Below is the sample result.

{
	"took": 82,
	"timed_out": false,
	"_shards": {
		"total": 5,
		"successful": 5,
		"failed": 0
	},
	"hits": {
		"total": 915,
		"max_score": 0,
		"hits": []
	},
	"aggregations": {
		"genres": {
			"doc_count_error_upper_bound": 15,
			"sum_other_doc_count": 858,
			"buckets": [{
				"key": "nike",
				"doc_count": 72,
				"users_with_hastag": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 34,
					"buckets": [{
							"key": "A",
							"doc_count": 14
						},
						{
							"key": "B",
							"doc_count": 9
						},
						{
							"key": "C",
							"doc_count": 8
						},
						{
							"key": "D",
							"doc_count": 7
						}
					]
				}
			}]
		}
	}
}

now in the above result you can see nike is used by more people and I have displayed only 4 to keep the output light. is there a way that i can paginate the results so when i run the paginate parameters i get the next set of users form 4 to 8

Mark_Harwood · March 12, 2018, 1:34pm

Partitions aren't designed with a global sort order in mind (i.e. the users returned in partition 2 aren't guaranteed to be any more or less popular than those returned in partition 1). Global sort orders on high-cardinality fields like UserIds are hard to reason about in a distributed system where each shard or index has only a small percentage of all the docs.
Partitions are a coping strategy for this problem. By examining arbitrary sub-groupings of terms independently of each other you can attempt to compute things like the top N of something within just that subgroup rather than attempting this analysis across the whole data.

tushar_chevulkar · March 12, 2018, 1:41pm

any specific example that can solve this?

Mark_Harwood · March 12, 2018, 3:54pm

What's the end goal and what business problem are you trying to solve?

I'm unsure what the use is of a sorted-by-popularity list of all users who have ever mentioned #nike.
We can discuss alternative approaches that would support this objective but it's worth understanding if that is really the requirement first

tushar_chevulkar · March 12, 2018, 4:22pm

My end goal as i mentioned earlier i need to scroll through aggregated results. I am writing a complex search query where user will type hashtag and he will see all the users who has posted with the hashtag like the #nike example. so the above query is showing me 2 reulults #nike and #nikewomen and the 4 users each now. Now on my website there is a show all button where he can see all the users in #nike so for that reason i need to scroll through the aggregated results its kinda pagination.

Mark_Harwood · March 12, 2018, 5:22pm

"Deep pagination" for an arbitrary query on a distributed system is expensive which is why Google won't let you page beyond a certain number of results for a given query.

If you really need to provide exhaustive results to your end users then you may be forced to reconsider how you physically arrange the data to optimise access for this use case. You may need to pre-aggregate data to keep related information locally e.g. maintain a single document per user with a list of all the hashtags they've ever used and and how frequently they were used.

system · April 9, 2018, 5:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Term Query is not working Elasticsearch	8	1448	December 26, 2018
Terms Aggregation Partitioning + filter buckets Elasticsearch	1	376	February 16, 2021
Writing aggregate with filtering Elasticsearch	5	4959	October 30, 2019
What I am wrong with this query? Elasticsearch	2	1156	August 23, 2018
Terms Aggregation with a Script not working Elasticsearch	3	5184	December 25, 2017

Term aggrigation with partition not working

Related topics