We have a requirement to store an array of 100,000 users in an Elasticsearch field. During search, we need to match if a user exists in that array and return the corresponding document.
Is it possible to achieve this using hashing or any other approach?
It sounds like this is a user persmissions field that you filter on. If this is the case I also assume you will be adding and/or removing users on a regular basis.
If this is the case you should be aware that updating very large documents (which this could be) is expensive. One way to handle this that I have seen in the past is to use a parent-child relationship where the parent is the document and the child is the permitted users. This does complicate querying as you would need to add an has child query to every query you run, which would have a performance impact. You would need to benchmark to see the impact, but note that having single very large documents also can have negative performance side effects. Having potentially 100000 child objects may not be optimal, so a workaround could be to create a set number of child objects and hash users into these. If you have 100 child objects per document each could hold an array of 1000 users. This would result in smaller documents that are more efficient to update and modify.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.