I have increased the heap size to 6GB per node and it works. However, I am
not entirely sure i am happy with this solution, considering that the
degradation point seems still pretty much unpredictable to me at the
moment.
One thing i must mention however is that after tweaking my client code to
create a huge number of small indexes, i was able to index the 40GB of data
on the cluster with the same settings. The downside to this is that I need
to be able query the data across all indexes for specific types, which
would mean much much bigger response times for queries. With my current
number of documents after optimizing the number of segments per index, i
got response times down to a few seconds over 10 for a basic query.
With a large number of small indexes, it seems that the degradation never
happens (at least not within the limits of the data i was working with)
assuming that there is enough heap to support it (smaller merges and
flushes?). Is there a downside besides query response times and complexity
to having a huge number of smaller indexes as opposed to having a small
number of huge indexes?
Thanks
--