Thanks Mike. I’m still a bit unclear on these comments:
IndexReader requires some RAM for each segment to hold structures like live docs, terms index, index data structures for doc values fields, and holds open a number of file descriptors in proportion to how many segments are in the index.
There is also a per-indexed-field cost in Lucene; if you have a great many unique indexed fields that may matter.
Aren’t these structures dependent on the size of the “lucene index"? Say if I have 1 large lucene index vs 10 small lucene indices (considering not much duplicated data across indices) wouldn’t the total memory used be the same? I understand that there will be more file descriptors because there will be more segments.
IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold recently indexed/deleted documents, and periodically opens readers (10 at a time by default) to do merging, which bumps up RAM usage and file descriptors while the merge runs.
According to the doc at https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/modules/indices.asciidoc https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/modules/indices.asciidoc seems like indices.memory.index_buffer_size is the “total” size of the buffer for all the shards on a node, so not sure how this would matter in case of having too many shards. I understand that there will be more file descriptors and a lot more “smaller” merge jobs running.
I’m going to test this myself, but I just wanted to understand the model better first so I have more accurate tests.
Thanks again,
Drew
On Jan 23, 2015, at 2:18 AM, Michael McCandless mike@elasticsearch.com wrote:
There is definitely a non-trivial per-index cost.
From Lucene's standpoint, ES holds an IndexReader (for searching) and IndexWriter (for indexing) open.
IndexReader requires some RAM for each segment to hold structures like live docs, terms index, index data structures for doc values fields, and holds open a number of file descriptors in proportion to how many segments are in the index.
IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold recently indexed/deleted documents, and periodically opens readers (10 at a time by default) to do merging, which bumps up RAM usage and file descriptors while the merge runs.
There is also a per-indexed-field cost in Lucene; if you have a great many unique indexed fields that may matter.
If you use field data, it's entirely RAM resident (doc values is a better choice since it uses much less RAM).
ES has common thread pools on the node which are shared for all ops across all shards on that node, so I don't think more indices translates to more threads.
Net/net you really should just conduct your own tests to get a feel of resource consumption in your use case...
Mike McCandless
http://blog.mikemccandless.com http://blog.mikemccandless.com/
On Thu, Jan 22, 2015 at 4:07 PM, Drew Kutcharian <drew@venarc.com mailto:drew@venarc.com> wrote:
Hi,
I just came across this blog post: Changing Bits: Lucene's RAM usage for searching http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html
Seems like there has been a lot of work done on Lucene to reduce its memory requirements and even more on Lucene 5.0. This is specifically interesting to me since I’m working on a project that uses Elasticsearch and we are planning on using 1 index per customer model (each with 1 or maybe 2 shards and no replicas) and shard allocation, mainly because:
-
We are going to have few thousand customers at most
-
Each customer will only need access to their own data (no global queries)
-
The indices are going be relatively large (each with millions of small docs)
-
We are going to need to do a lot of parent/child type queries (and ES doesn’t support cross-shard parent/child relationships and the parent id cache seems not that efficient, see Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html and id_cache memory footprint grows linearly with number of parent documents · Issue #3516 · elastic/elasticsearch · GitHub https://github.com/elasticsearch/elasticsearch/issues/3516#issuecomment-23081662). This is the main reason we feel we can’t use time based (daily, monthly, …) indices.
-
Being able to easily “drop” an index if a customer leaves the initial trial.
I wanted to better understand the overheads of an Elasticsearch shard. Is it just memory or CPU/threads too? Where can I find more information about this?
Thanks,
Drew
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F59813A2-904C-4B29-BBC9-6174DD3C8DAF%40venarc.com https://groups.google.com/d/msgid/elasticsearch/F59813A2-904C-4B29-BBC9-6174DD3C8DAF%40venarc.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcpOy6RYgvi-GC6jpsuO1-qsRcTecUvr066Rkr3qxZijA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAD7smRcpOy6RYgvi-GC6jpsuO1-qsRcTecUvr066Rkr3qxZijA%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85AA9AA2-2B5A-49DF-969F-96F5C3438290%40venarc.com.
For more options, visit https://groups.google.com/d/optout.