We have huge amount of data (5Billion records, 3TB in size) organized in
parent / child type in one index to enable the joins. My first question is,
how should I allocate shards for this big index in order to make the
parent/child query more efficient? Right now doing queries will cause out
of memory on several nodes, and I have 7 VMs, with 64GMem, and 1T disk.
Each Es has 32Gmem allocated to it. The index has 20 shards.
We have huge amount of data (5Billion records, 3TB in size) organized in
parent / child type in one index to enable the joins. My first question is,
how should I allocate shards for this big index in order to make the
parent/child query more efficient? Right now doing queries will cause out
of memory on several nodes, and I have 7 VMs, with 64GMem, and 1T disk.
Each Es has 32Gmem allocated to it. The index has 20 shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.