Master node : 1 (16cpus, 128GB RAM) - 32GB is allocated for HEAP
Data nodes : 6 (each node 16cpus, 128GB RAM) - 32GB is allocated for
HEAP for all nodes
Indices : 9
Total documents : 48,826,456
Stored size : 2TB
The only way to make an inherently slow task faster is through parallelism.
Use routing to organise all data related to each profile on the same machine. Send searches with routing to target specific machines with queries that only contain profile IDs that are known to be stored on those machines. Send these in parallel from your client.
Without being quite so targeted/optimised you could try break a single search into several smaller bundles of profile IDs and run them in parallel. Each node may be looking for profile IDs it doesn't have but may help squeeze more speed out of each server if it is capable of servicing simultaneous requests well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.