Thank you for the reply, Mark.
Heaps are adjusted to 30GB (I liked round numbers :)).
50GB is a good max shard size to keep in mind, and I'll adjust index
groupings as needed based on that.
With regards to number of indexes, here is what I was thinking, and please
tell me if I'm off base here.
With all log files going to a single daily index, assume log file A is 45GB
of data (in its own _type), and log file B is 5GB of data (in its own
_type). Searching for data in log file B is "penalized" in terms of search
performance because ES loads terms from the index (based on some predictive
algorithm). Also the heap is "penalized" because it now has terms loaded
from this large index that it probably will not need.
If log file B is instead gathered into its own index, then it is both
faster from a search performance perspective, and less pressure on the heap
because there are far less terms loaded by ES.
Maybe I'm incorrect in my assumptions though about how ES does its work,
and all I really care about is raw index size? Perhaps both the
predictive term loading done by ES, and its search logic is savvy enough to
restrict itself to the _type specified in the query?
Thank you again for your help! I'm getting a better understanding for
sure.
Chris
On Tue, Jan 27, 2015 at 7:01 PM, Mark Walkom markwalkom@gmail.com wrote:
Be aware that we do not yet officially support G1GC. You should also
reduce your heap to 31GB.
Ideally you want to keep shard size below 50GB, so you will need to adjust
things as you grow. Be careful creating a lot of indices though, each one
takes overhead and if you increase the number of indices and the amount of
data you have in them you could be wasting resources.
However when querying, 100 indices with 1 shard is the same as 1 index
with 100 shards.
On 28 January 2015 at 10:11, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,
I've seen lots of posts about this, and want to make sure I'm
understanding correctly.
Background:
- Our cluster has 6 servers. They are Dell R720xd with 64GB RAM,
2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk
- Elasticsearch is set to have 6 shards, and 1 replica, giving two
shards per server. I'm giving ES 32GB heaps on Java 1.7 with G1 GC.
I'm concerned about the size of our indexes. Right now, we store all
data in one index per day, with various types within that to separate data.
The indexes are averaging about 50GB/day (not including replicas). Shard
size is 8GB each.
We have a LOT more data to index. At least 20x more. Should I be
concerned with indexes of that size (~1000GB) and shards of that size
(~160GB)? Is it merely a question of having enough hardware, or is there
more to it?
I'm considering splitting the data into a different indexing strategy so
that the index size is smaller, but there are more of them. The result is
the amount of data is the same, so I'm not sure if that will do anything or
not.
If I'm optimizing for searching, does querying multiple smaller indices
perform better than querying fewer larger ones?
Thank you for your time.
Chris
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DpgDNsFquMJw2T7pOZMHhnimfYAHxH3iSnRnCqx_9k40-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.