Hello,
We are in the midst of transitioning much of our data into ES, but have
been running into performance issues loading data into our index.
As recommended in other threads on this group, I've created a separate
testbed for understanding the interaction of ES with our data.
Summary question:
- What are the main resource constraints and flags that are indicators of
hardware and cluster scaling needs? Is it a max number of my certain type X
of documents for a shard and the corresponding disk and memory space its
running takes? Our experience below makes it difficult (for me currently)
to assess a clear decision making path based upon the behavior I am
observing.
Long story:
We have 2 major document types so we've created 2 indices, one for each
type. For our testing to assess the "limits" of our data and ES, we set
them to be single shard indices.
On a single node, it's a Rackspace Cloud 8GB machine.
We've set java heap to be 75% of available RAM, so that's 6GB and the
machine is assigned 4 CPUs
We're running bulk indexing with the following settings for the single
shard indices:
{
"index": {
"index.number_of_replicas": "0",
"merge.policy.merge_factor": 30,
"refresh_interval": "-1",
"store.throttle.max_bytes_per_sec": "5mb",
"store.throttle.type": "merge"
}
}
We are simultaneously running bulk operations on the two documents and
their respective indices.
On one index, FORM_INDEX,
it's humming along fine, bulk index operations of 500 documents have never
deviated over 2-3 seconds for the 500 document POST. They usually are in
the 100-500ms range for the 500 document POST bulk indexing operations.
As of this writing, the Index (and lone shard) size is 1.2gb, 300,000
documents - no change in performance. So far so good!
On the other index, CASE_INDEX
It was humming along fine, bulk index operations of 500 documents were in
the sub 1000 ms range. However, once it hit 200,000 documents, the POSTs
started inching up in time - 10 seconds, 30, 60, 120 seconds.
It was interminably slow for about an hour during this time, but then it
inexplicably sped up to < 1 second bulk insert times. It experienced
another blowup in insertion times, but now it is back to sub 1500ms
insertion times.
the CASE_INDEX stats right now are 7gb index (and lone shard) size, 266,000
documents.
ES is reporting its heap at 943mb, bigdesk
So initially I'm thinking it's understandable that CASE_INDEX would have
some slowdown compared to FORM_INDEX due to the complexity of the mappings
we have defined and the indexing we do on it...but I would have expected a
more graceful decline in performance showing "you are reaching the end of
the line on this shard" - but instead it's rather bursty.
My main question that I am seeking a decision making heuristic of some sort
in understanding when to up the resources on our cluster, or play with our
shard count and JVM settings. Is my desired outcome some magic number
limit for num_docs and size for CASE_INDEX and XFORM_INDEX respectively for
a given machine RAM size?
I ask this because I ran into a similar precipitous decline in bulk indexed
docs per second on our other testbed with 5 shard indices that also
magically recovered with no intervention on our end. This is the same 8GB
hardware running CASE_INDEX and FORM_INDEX at 5 shards and 500k and 1
million docs in each respectively. 4gb out of 6gb heap, and a similar bulk
indexing operation ran into similar fast+slow bulk insertion rates.
I'm ill equipped to explain and justify throwing more (virtual) hardware to
our environment when our single shard setup is exhibiting the same behavior
as our presumably overburdened machine.
Thanks,
Dan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.