Recommended Hardware Specs & Sharding\Index Strategy

David_Arata · August 9, 2013, 2:32pm

Hi there,

I am new to ElasticSearch and Lucene and have been prototyping my business
case with a single node both locally and in AWS. I want to move to the next
step and start building out an ES cluster in AWS. One thing I am trying to
figure out is what would be the recommended amount of nodes, hardware specs
for nodes, and my sharding\index strategy I should have for my business
case. Any advice, best practices, tips would be much appreciated. Here is
some info about the documents I am indexing.

The documents I am indexing represents an object lets say a user. For the
user object I have facts about the user that I need to retain on a weekly
basis for possibly 2 years. I also need to be able to query and sort on
deltas of those facts. To meet the need of sorting on facts deltas, each
object contains all the facts for every week and I use the custom score
query with a script field to do something like:

"doc.facts.week1.fact2 - doc.facts.week2.fact2;"

For this case the document JSON looks something like this:

{

"name": "User1",

"facts":{

"week1": {

"fact1": 10,

"fact2":100

},

"week2":{

"fact1": 30,

"fact2":500

}

My actual documents have a bit more so here are some sizing details:

Average Document size with 10 weeks of data - 178.5kb
Extrapolated to 104 weeks (2 years) of data - 1.74mb per doc

Current Estimates have about 4MM objects in the system and can grow to 10MM
objects in the next year

So to start out the cluster would handle about 680gb and could grow to
16tb+ over the next two years.

Finally I am also trying to understand best practice for a sharding\index
strategy. My understanding is that querying against 1 index with 2 shards
is the same as querying against 2 indices. However when you create an index
you specify the amount of shards and the amount of shards cannot change. My
concern is what would would be the best strategy so that an index or single
shard in an index does not get too big for a node to handle and if its
approaching that size what can be done? Here is some more info about my
data to understand possible sharding\index strategy.

The data is partitioned by accounts. Accounts have regions and regions
contain users. Most accounts have around 3-5 regions and 10,000 - 50,000
users for the whole account. There are some larger accounts that have about
100 regions with about 100,000 users.

My initial thoughts were to make an index for each account or an index for
each region. With just 1 shard and 1 or 2 replicas for redundancy. But
wasn't sure what I would do if an index became too large for one node to
handle.

Please let me know if more info would be useful.

Thanks in advance for any help.
-dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · August 10, 2013, 6:58am

Hey,
I can give some advice on a sharding strategy. If you are doing updates
to your data (sounds like you are) you want to keep the shard sizes no more
than 20GB each, I'd say more like 10GB really. This is because you can get
some very large segments that accumulate lots of deleted items that can
only be expunged when a merge occurs. The max segment size defaults to 5GB,
so the merge engine appears to wait for two 5GB segments to have 50% of the
docs deleted.

I would choose how to subdivide your indexes based on query patterns. If
you are doing lots of region specific indexes, that is a good place to
subdivide. If you keep your indexes to a reasonable size, it isn't too
painful to reshard by rebuilding the content into a fresh index and
swapping that in via aliasing.

As for how many nodes, that really depends on a lot of factors. The good
thing about AWS is you can quickly find the sweet spot for your data and
queries and easily expand as you need to.

ES can do wonderful things, such as allowing you to due computations at
sort time, but that may have scaling issues on large result sets. Why not
just store the delta instead of recomputing it every time? It would be much
faster.

Best Regards,
Paul

On Friday, August 9, 2013 8:32:22 AM UTC-6, David Arata wrote:

Hi there,

I am new to Elasticsearch and Lucene and have been prototyping my business
case with a single node both locally and in AWS. I want to move to the next
step and start building out an ES cluster in AWS. One thing I am trying to
figure out is what would be the recommended amount of nodes, hardware specs
for nodes, and my sharding\index strategy I should have for my business
case. Any advice, best practices, tips would be much appreciated. Here is
some info about the documents I am indexing.

The documents I am indexing represents an object lets say a user. For the
user object I have facts about the user that I need to retain on a weekly
basis for possibly 2 years. I also need to be able to query and sort on
deltas of those facts. To meet the need of sorting on facts deltas, each
object contains all the facts for every week and I use the custom score
query with a script field to do something like:

"doc.facts.week1.fact2 - doc.facts.week2.fact2;"

For this case the document JSON looks something like this:

{

"name": "User1",

"facts":{

"week1": {

"fact1": 10,

"fact2":100

},

"week2":{

"fact1": 30,

"fact2":500

}

}

}

My actual documents have a bit more so here are some sizing details:

Average Document size with 10 weeks of data - 178.5kb
Extrapolated to 104 weeks (2 years) of data - 1.74mb per doc

Current Estimates have about 4MM objects in the system and can grow to
10MM objects in the next year

So to start out the cluster would handle about 680gb and could grow to
16tb+ over the next two years.

Finally I am also trying to understand best practice for a sharding\index
strategy. My understanding is that querying against 1 index with 2 shards
is the same as querying against 2 indices. However when you create an index
you specify the amount of shards and the amount of shards cannot change. My
concern is what would would be the best strategy so that an index or single
shard in an index does not get too big for a node to handle and if its
approaching that size what can be done? Here is some more info about my
data to understand possible sharding\index strategy.

The data is partitioned by accounts. Accounts have regions and regions
contain users. Most accounts have around 3-5 regions and 10,000 - 50,000
users for the whole account. There are some larger accounts that have about
100 regions with about 100,000 users.

My initial thoughts were to make an index for each account or an index for
each region. With just 1 shard and 1 or 2 replicas for redundancy. But
wasn't sure what I would do if an index became too large for one node to
handle.

Please let me know if more info would be useful.

Thanks in advance for any help.
-dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 10, 2013, 1:50pm

Your concern is a single shard getting "too big". If you use 64bit JVM and
mmapfs (quite common), you can open even the largest files. So from this
point of view, a node can handle the biggest files. There is no real limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

do not select a smaller number of shards than your total number of nodes
you will add to the cluster. Each node should hold at least one shard.
do not let a shard grow bigger than your JVM heap (this is really a rough
estimation) so segment merging will work flawlessly
if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can reindex
with a higher number of shards (having the _source enabled is always an
advantage for reindexing) with your favorite custom tool. Reindexing can
take significant time, and may not be an option if you can't stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata david.arata@gmail.com wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Arata · August 12, 2013, 3:13pm

hey guys thanks for the insight and help. Based on the info you gave here
is what I think is the strategy I will try if you have any comments please
let me know.

my cluster will consist of 2 master nodes both m1.large boxes. Data nodes
will be m2.xlarge. The data nodes have 17GB of RAM so 8.5GB for ES.

I will create indices based on regions. I will limit each shard on a region
index to 8GB. Once a region index shard becomes larger than 8GB I will
reindex and add a new shard to it using a script like this one
GitHub - mallocator/Elasticsearch-Exporter: A small script to export data from one Elasticsearch cluster into another.. I will also have each
index setup with 1 replica so there will be one redundant shard for every
shard in an index.

Since the data is indexed on a weekly basis, at some precomputed time over
the weekend I can run the optimize command on the indices which should
clean up the segments.

Paul in re: your comment about precomputing the deltas. I tried thinking
on how I could achieve that but since the business case is the ability to
compare any two weeks of facts i would need to precompute every delta
between any given two weeks.

thanks again for the help.
-dave

On Saturday, 10 August 2013 09:50:27 UTC-4, Jörg Prante wrote:

Your concern is a single shard getting "too big". If you use 64bit JVM and
mmapfs (quite common), you can open even the largest files. So from this
point of view, a node can handle the biggest files. There is no real limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

do not select a smaller number of shards than your total number of nodes
you will add to the cluster. Each node should hold at least one shard.

do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can
reindex with a higher number of shards (having the _source enabled is
always an advantage for reindexing) with your favorite custom tool.
Reindexing can take significant time, and may not be an option if you can't
stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata <david...@gmail.com<javascript:>

wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Arata · August 12, 2013, 3:16pm

sorry forgot to also add. Each Data node would have a 420gb EBS volume
attached to it and would hold 50 8gb shards/replicas per node. Wasn't sure
if that would kill a node to have that many shards/replicas on it.

On Monday, 12 August 2013 11:13:17 UTC-4, David Arata wrote:

hey guys thanks for the insight and help. Based on the info you gave here
is what I think is the strategy I will try if you have any comments please
let me know.

my cluster will consist of 2 master nodes both m1.large boxes. Data nodes
will be m2.xlarge. The data nodes have 17GB of RAM so 8.5GB for ES.

I will create indices based on regions. I will limit each shard on a
region index to 8GB. Once a region index shard becomes larger than 8GB I
will reindex and add a new shard to it using a script like this one
GitHub - mallocator/Elasticsearch-Exporter: A small script to export data from one Elasticsearch cluster into another.. I will also have
each index setup with 1 replica so there will be one redundant shard for
every shard in an index.

Since the data is indexed on a weekly basis, at some precomputed time over
the weekend I can run the optimize command on the indices which should
clean up the segments.

Paul in re: your comment about precomputing the deltas. I tried thinking
on how I could achieve that but since the business case is the ability to
compare any two weeks of facts i would need to precompute every delta
between any given two weeks.

thanks again for the help.
-dave

On Saturday, 10 August 2013 09:50:27 UTC-4, Jörg Prante wrote:

Your concern is a single shard getting "too big". If you use 64bit JVM
and mmapfs (quite common), you can open even the largest files. So from
this point of view, a node can handle the biggest files. There is no real
limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

do not select a smaller number of shards than your total number of
nodes you will add to the cluster. Each node should hold at least one shard.

do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can
reindex with a higher number of shards (having the _source enabled is
always an advantage for reindexing) with your favorite custom tool.
Reindexing can take significant time, and may not be an option if you can't
stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 12, 2013, 3:36pm

If you have a 420g volume, you should calculate you may require an overhead
of 50% for temporary files while index construction. That gives a maximum
of ~200g for a final index size on disk.

Jörg

On Mon, Aug 12, 2013 at 5:16 PM, David Arata david.arata@gmail.com wrote:

sorry forgot to also add. Each Data node would have a 420gb EBS volume
attached to it and would hold 50 8gb shards/replicas per node. Wasn't sure
if that would kill a node to have that many shards/replicas on it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Arata · August 12, 2013, 3:41pm

From your earlier suggestion to not allow a shard to grow bigger than the
JVM heap size. So are you suggesting with a 420gb EBS volume to make the
JVM heap size larger than system RAM?

On Monday, 12 August 2013 11:36:57 UTC-4, Jörg Prante wrote:

If you have a 420g volume, you should calculate you may require an
overhead of 50% for temporary files while index construction. That gives a
maximum of ~200g for a final index size on disk.

Jörg

On Mon, Aug 12, 2013 at 5:16 PM, David Arata <david...@gmail.com<javascript:>

wrote:

sorry forgot to also add. Each Data node would have a 420gb EBS volume
attached to it and would hold 50 8gb shards/replicas per node. Wasn't sure
if that would kill a node to have that many shards/replicas on it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 13, 2013, 9:40am

I'm not suggesting to make a heap larger than system RAM. You can't load a
420g volume into a JVM heap. The size of your disk volume is not relevant
for shard sizing.

The idea is a shard should be sized so that it can completely be loaded
into the system RAM of an ES node for best search performance. I mean the
filesystem cache. Because of compression, different data, different
queries, different merge frequencies, YMMV.

If you accept disk access for slower performance, you can use ~200g out of
420g (plus ~200g for temporary files) for your index on disk, no matter how
many shards.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Arata · August 13, 2013, 11:48am

ah I think I see what you're saying now. So my max shard size should be 8g
cause my heap size will be about 8.5g. But with 420gb EBS Volume the total
amount of shards shouldn't take up more than 200gb.

I hope I am understanding that correctly and that definitely makes sense
for temporary files. So next question is there any kind of a limit on the
amount of shards on one node. What if I was to up the EBS volume to
something larger lets say 5TB. Going off of your earlier suggestions of
only using half so that means I could store up to 2.5TB of ES indices on
the node. In my current plan of 8gb shards thats about 320 shards. Is that
too many for one node to handle? Maybe at that large of a disk size I would
probably want to up my JVM heap and hold larger shards?

thanks again for the help
-dave

On Tue, Aug 13, 2013 at 5:40 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I'm not suggesting to make a heap larger than system RAM. You can't load a
420g volume into a JVM heap. The size of your disk volume is not relevant
for shard sizing.

The idea is a shard should be sized so that it can completely be loaded
into the system RAM of an ES node for best search performance. I mean the
filesystem cache. Because of compression, different data, different
queries, different merge frequencies, YMMV.

If you accept disk access for slower performance, you can use ~200g out of
420g (plus ~200g for temporary files) for your index on disk, no matter how
many shards.

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rN0Xa1vgE8w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 13, 2013, 4:50pm

It depends on your requirements.

Imagine two workloads

a) huge data quantity, one initial ingest, known final index size, almost
no modification, rare searches, simple queries here and there, no shard
movement to other nodes

b) huge data quantity but frequent modifications and incremental growing,
unknown final index size, massive queries, many clients, complex
filters/caches

If you opt for a) you can happily fill up your disk, even terabytes. If you
opt for b) you should consider to distribute your data over many machines
as possible, and keep each shard in a size that fits into memory. The idea
is to align the CPU power with the RAM resources and the network bandwidth
as close as possible, for best performance.

ES scales "out" over machines, over shard numbers distributed on different
nodes, it does not scale "up" with the heap size or with the shard size
within a node.

ES does not add up all shard resources within a node statically. It can
intelligently use only the relevant shards that participate in a given
index or query operation. Because of this, there is no formula for "too
many shards for one node" because it really depends on your requirements.

You can keep thousands of shards on a node if you have quiet activity. I
remember someone who reported 10.000 shards here on a single node (not
recommended though). But, you can overload a node easily if you have high
search/index activity. This activity is not a question of the number of
shards of a node.

Another question is maintainability and monitoring. You should be aware
that with a number of some hundred shards per index, the admin tools have
immense difficulties to display comprehensive graphs and numbers about what
is going on. You could try for yourself if that works. Personally, I have
indexes around ~72 shards for a 72 CPU core cluster.

Jörg

On Tue, Aug 13, 2013 at 1:48 PM, David Arata david.arata@gmail.com wrote:

In my current plan of 8gb shards thats about 320 shards. Is that too many
for one node to handle? Maybe at that large of a disk size I would probably
want to up my JVM heap and hold larger shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mrno42 · July 2, 2014, 7:50pm

When you say "- do not let a shard grow bigger than your JVM heap (this is
really a rough estimation) so segment merging will work flawlessly"

are we counting all the primary and replicas shards of all indexes on that
node? So for example, if we had two indexes with on 10 node cluster. Each
index has 10 shards and 1 replica(40 total in cluster).

So per node, the heap size should be larger than:

1 shard for first index
1 shard for replica of first index
1 shard for second index
1 shard for replica second index

the four shards combined?

Thanks again for your advice

On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote:

Your concern is a single shard getting "too big". If you use 64bit JVM and
mmapfs (quite common), you can open even the largest files. So from this
point of view, a node can handle the biggest files. There is no real limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

do not select a smaller number of shards than your total number of nodes
you will add to the cluster. Each node should hold at least one shard.

do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can
reindex with a higher number of shards (having the _source enabled is
always an advantage for reindexing) with your favorite custom tool.
Reindexing can take significant time, and may not be an option if you can't
stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata <david...@gmail.com
<javascript:>> wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · July 2, 2014, 10:53pm

The heap should be as big as your largest shard, irrespective of what index
it belongs to or if it's a replica.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 July 2014 05:50, mrno42 dougngo@gmail.com wrote:

When you say "- do not let a shard grow bigger than your JVM heap (this is
really a rough estimation) so segment merging will work flawlessly"

are we counting all the primary and replicas shards of all indexes on that
node? So for example, if we had two indexes with on 10 node cluster. Each
index has 10 shards and 1 replica(40 total in cluster).

So per node, the heap size should be larger than:

1 shard for first index
1 shard for replica of first index
1 shard for second index
1 shard for replica second index

the four shards combined?

Thanks again for your advice

On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote:

Your concern is a single shard getting "too big". If you use 64bit JVM
and mmapfs (quite common), you can open even the largest files. So from
this point of view, a node can handle the biggest files. There is no real
limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

do not select a smaller number of shards than your total number of
nodes you will add to the cluster. Each node should hold at least one shard.

do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can
reindex with a higher number of shards (having the _source enabled is
always an advantage for reindexing) with your favorite custom tool.
Reindexing can take significant time, and may not be an option if you can't
stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3fc800b0-28d6-4ad0-8aa5-eb182d9b27ee%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z_Fdx7-ew-XaapNN7wN6zfjD97PXSLk3G-QrXOVVoX6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

smonasco_2 · July 3, 2014, 12:04pm

I wonder if you might get some better performance via parent child relationships. Reinforcing a large nested document would mean many Lucene deletes and inserts every week, but adding new data based on this past week's info should be cheaper in index operations and probably in all other places having to load the entire 1.74meg documents over what's probably less than 365 times smaller.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2960353-f75d-460d-b77a-7b890bf0d820%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Figuring out the optimal number of shards Elasticsearch	6	1650	July 6, 2017
Shard size / Index number / server count and performance Elasticsearch	4	1409	July 6, 2017
New User -- Index Settings Reccomdendations and Suggestions Elasticsearch	8	462	July 6, 2017
Increasing shards and then nodes Elasticsearch	12	916	July 6, 2017
Slow Query Performance Elasticsearch	10	791	July 6, 2017

Recommended Hardware Specs & Sharding\Index Strategy

Related topics