Shard allocation for large amount of data

Chen_Wang · June 9, 2014, 11:12pm

We have huge amount of data (5Billion records, 3TB in size) organized in
parent / child type in one index to enable the joins. My first question is,
how should I allocate shards for this big index in order to make the
parent/child query more efficient? Right now doing queries will cause out
of memory on several nodes, and I have 7 VMs, with 64GMem, and 1T disk.
Each Es has 32Gmem allocated to it. The index has 20 shards.

Any insights are helpful!
Thanks,
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACim9RkMgWAxAZnLagKjnZd_saoQdP0Gof7t0-MsK97d4F--yw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · June 9, 2014, 11:23pm

You need to add more nodes.
Changing shard layout is unlikely to help if you're getting OOM.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 09:12, Chen Wang chen.apache.solr@gmail.com wrote:

We have huge amount of data (5Billion records, 3TB in size) organized in
parent / child type in one index to enable the joins. My first question is,
how should I allocate shards for this big index in order to make the
parent/child query more efficient? Right now doing queries will cause out
of memory on several nodes, and I have 7 VMs, with 64GMem, and 1T disk.
Each Es has 32Gmem allocated to it. The index has 20 shards.

Any insights are helpful!
Thanks,
Chen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACim9RkMgWAxAZnLagKjnZd_saoQdP0Gof7t0-MsK97d4F--yw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CACim9RkMgWAxAZnLagKjnZd_saoQdP0Gof7t0-MsK97d4F--yw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aMtfA3JVMskrPJEGOcr55%3DL2VjRteJH2pR-BTGL%3DsJRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Theoretical IMPORTANT questions about architecture of elasticsearch Elasticsearch	6	703	January 30, 2021
Shards needed in parent-child indexing Elasticsearch	14	1168	January 17, 2020
Better cluster configuration for 63 terrabyes of data Elasticsearch	5	163	July 2, 2024
[parent-child] on shards/nodes number Elasticsearch	2	423	October 11, 2013
Figuring out the optimal number of shards Elasticsearch	5	1679	June 24, 2011

Shard allocation for large amount of data

Related topics