Graceful way to un-overwhelm the ES

We're on AWS and ran into some very similar problems on 2 nodes. I
ended up using m2xl nodes with 12GB for ES across 2 nodes and indexing
ran very well. We're up to 420m docs.

  • Craig

On Thu, May 10, 2012 at 2:59 PM, andym imwellnow@gmail.com wrote:

Yes, it's single node and ES_HEAP_SIZE is as 5120M (not 520M)

You're are right, I'll have to move to bigger machine or split it into
2 machines as it started happening relatively recently (at ~300M docs)

My question is whether these restarts are safe at the moment and do
not lead to data loss in ES (where ES would return "OK" to processing
threads which would mark jobs as completed, but then ES would not
persist them due to restart). ES is currently running with
threadpool.index.type: cached
threadpool.bulk.type: cached

I tried to make these "blocking" but then processing threads were idle
most of the time just waiting for ES to return

On May 10, 4:48 pm, Shay Banon kim...@gmail.com wrote:

We are talking about a single ES node, right? For the amount of data that
you indexed, seems like you are hitting memory limits, 520mb for the amount
of data you have is not enough, probably should go to 3.5 or 4 gb (out of
the 7gb this instance type has) as ES_HEAP_SIZE.

On Thu, May 10, 2012 at 11:38 PM, andym imwell...@gmail.com wrote:

Hi,

I am currently running indexing on c1.xlarge (with 4 ephemeral drives
in RAID0 and gateway going to S3). Everything works great except every
several hours ES gets overwhelmed and indexing slows significantly.
From what I can see from bigdesk during “normal” ES operation “Heap
Mem” window has jigsaw pattern but when it gets overwhelmed seems like
no GC happens (no jigsaw pattern in bigdesk) and memory is maxed
( configured at 5120M)

Doing ES service restart (through “bin/service/elasticsearch –
restart”) solves the problem for a few hours but then the problem re-
appears.

I wonder whether restarting ES when it is on such state going to lead
to any data loss so I can put this into a cron job to assure indexing
continues (or whether there are any better ways to address the
problem)

Thanks,

-- Andy

P.S. Some background: I am running ES 19.2 with “refresh interval” set
to zero and ES currently has about 400 million documents in 2 indexes
with about 600G total index size (and I expect about 600M docs more
with around 1T of data. The mapping has _source set to compressed.)
The data processing and insertion into ES is done by multiple threads
on 20 or so m1.xlarge machines (when ES goes down or returns errors
threads back-off with exponential timeout and restart when ES is back
on-line). There are 8-12 threads per machine doing mostly data
processing and if I were to trust that in bigdesk “HTTP channels”
indicate number of active connections, then it means that 30-40
threads are connected to ES at any given time. Indexing rate is about
750 docs per second sometime maxing out at 10,000 docs per second. The
average doc size is about 5000 bytes

--

CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939