Can you help me reconcile this thread, with what we discussed here:
On that thread, it sounded like I could define a bucket up front (for a given account), and then documents added to that bucket would be held together on a single shard.
If you use a routing key, all documents with the same key will go to a single and same shard.
Buckets (that you compute when running an aggregation) are computed live when you run the search query.
If you pass a routing key to the request, then only one shard will be used to run the aggregation.
So if you have a userId in your documents, and use userId as the routing key, then if you want to compute an aggregation for user 7 (only), you can pass routing=7, do a bool filter query with userId=7 and run a terms aggregation on whichever field you want.
Now, what problem are you trying to solve actually?
This is a continuation of the thought started here.
On our earlier conversations here, I had the impression that buckets had a persistence aspect. From reviewing the docs, and your comments on this thread, it appears I am mistaken. It appears that routing + shards is the the only path for me.
I suppose the next question becomes: When I do bucket aggregations, how do I make sure the ES limits itself to a single shard? (Does routing apply to bucket aggregations?)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.