ES picks up translogs and replays them the next start after a node went
down.
You work heavily with facets, and I share your concerns about OOMs inducing
flakiness to the whole cluster.
Have you checked how your cache on the heap is used? Note that some caches
are turned on by default and may interfere with your facets.
You should also look into the new aggregation framework of 1.0.0.Beta2 if
the new faceting is less resource consuming.
Regarding the indexing, check if you have a strategy for segment merging
and throttling. Setting custom values can take much pressure off the heap,
especially when segments grow very large.
And finally, check if you can identify the sweet spot when to add nodes, if
heap usage is just getting too high.
Jörg
On Wed, Dec 11, 2013 at 10:03 AM, Vaidik Kapoor kapoor.vaidik@gmail.comwrote:
For our use-case, we need facets desperately. Otherwise we will have to do
that in the application logic, which is not ideal and honestly a lot of
work too. ES gives me that. However, with the number of documents we need
to index per second (30-50 per second, and this number is going to grow
with time), I wonder what do people do to make sure that:
- There are least chances of data loss. You cannot flush segments to disk
very quickly as that won't be optimal and to my knowledge a lot of
unoptimized segments will be created if I manually use the Flush API. So
what does ES do, when data has been written to translog but the operations
have not been flushed and the node goes down?
- If there is scope for data loss, then what does one do to detect it?
Being new to ES, I am still trying to understand where is ES using JVM
heap properly and how it can affect my cluster. Consider this: we have
three nodes and we are indexing data in it at the rate of 30-50 docs per
second. When I started the cluster, JVM heap usage was low (about 2-4% on
each node). With time, that keeps on growing and stabilizes in between
81-94%. Now, in the meanwhile I am just indexing and not querying data at
all from the cluster. I am using G1-GC instead of CMS-GC because CMS-GC was
giving me long long pauses (about 13-17 seconds for garbage collection)
which is not ideal. With G1, GC is frequent and quicker (so far I have seen
about 1 second). This works but I am always concerned that the JVM Heap
usage is so high and if a little more load is in the pipeline, then what
will happen. Will ES be able to take it or there are chances of
experiencing OutOfMemory exceptions, leading the node to go down.
Obviously, this is something that I will have to test according to my
use-case, but I am interested in knowing if there is someone around here
who has experienced similar problems and have found the solution or a work
around.
After some time, GC happens so quickly that I can make out that it is
affecting indexing (I am indexing using a Rabbit consumer written in Python
and after every 10-20 seconds, I'd see a peak in the queue, suggesting that
the consumer is not able to consume, further suggesting that the consumer
is not able to quickly write to ES, leading me to assume that GC is the
cause of the slow write as the CPU is busy.
So:
- What are/could be the reasons of such heap usage? What is ES doing with
so much heap?
- How can I keep that in control?
Thanks
Vaidik Kapoor
vaidikkapoor.info
On 11 December 2013 08:22, Eugene Strokin eugene@strokin.info wrote:
I use ES as a primary datasource from 0.2 version. It is in production
for almost 2 years. Starting from 5 shards all on the same node, to 1
replica of those 5 shards on 3 nodes. Serves about a doezen requests per
seconds in average. All kind of requests, searches, filtering, sorting,
faceting. I had transferring whole cluster even to different datacenters
with zero down time several times. All problems I had was only because I
did something wrong, but it wasn't fault of ES.
So, I could say now, that ES could be used as the only data store.
I'v tried several other options, like Solar - too hard to scale,
Cassandra - not easy to support complex (and I wouldn't even call it
complex really) data structure. Hbase on hadoop - too low level. And
performance of ES is very impressive comparing to others.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/791b3323-fcf9-47fb-b874-e7e7b9feb7c9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5mTCSmU73MFyMRWJUvbiMXzGj%3DPssLWc7rwjuD-ZAOwVg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoErrt6LDFDwMXKsqBmyQoVgi8hn6mDNNDXA1-u5cXt5bg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.