=== Context ===
We have a cluster with five nodes on virtual machines. The machines are
running CentOS on OpenVZ. Using ES beta2. Four of the machines have 4GB of
memory and 3 cores each. The fifth one has 6GB of memory and 4 cores. On
all machines ES is given half of the memory.
The cluster hosts two indexes. Index1 has two types, A and B. A documents
are fairly small but has one field of type string array which can hold up
to 500 elements. B documents are very small. There's a parent/child
mapping, mapping B documents as children to A documents. B documents have
ttl of 30 days, but no C document is yet that old. For both A and B there's
a dynamic mapping that maps all strings as not analyzed. Index1 has refresh
time set to 30s.
Index2 has a single type, C. C documents are fairly large and have a couple
of string fields that are analyzed using the snowball analyzer. Index2 has
refresh time set to 2s.
The application deals with 50-100 requests/second. For each request the
following happens:
- A bulk request updating an A document and indexing a new B document. The
A document is updated using a fairly heavy mvel script and uses upsert if
the document doesn't already exist. The B document is always a child of the
A document. - A mget request is made for one C document and one A document. The A
document is always the same A document as the one being updated. Only a
single fields from the C document is retrieved. - For about half of the request an additional mget request is made getting
25 more C documents. - Also, recently we had a incident where one of the nodes run out of disk
space. This caused some updates of A documents to fail with an EOF
exception as the document couldn't be retrieved. We're dealing with this by
detecting failed updates of A documents and then making an additional index
request to overwrite the "corrupt" A document.
In addition to the above about 3 search requests are made each second and
about one C document is indexed per second.
The indexes currently hold about 5M A documents, 50M B documents and 30K C
documents.
All of the nodes are reporting 100% memory usage in the virtualization
control panel or using free, but less when using top. CPU usage is around
30%. In ES slow logs we see indexing times up to 10s every couple of
minutes. We're also seeing search times of 3-10s every 5 minutes. Both of
which are acceptable from the application's perspective.
Below is our current cluster settings:
"threadpool.index.queue_size" : "300",
"threadpool.index.queue_size2" : "300",
"threadpool.bulk.queue_size" : "300",
"threadpool.get.type" : "cached",
"threadpool.index.type" : "cached",
"threadpool.bulk.type" : "cached"
=== Problem ===
Logging on the application level shows that 10-30% of the mget requests
being made on a per request basis take 1 to 8 seconds, which isn't
acceptable. The other 70-90% seem to be really fast. When making GET
requests directly to the nodes the same pattern appears. Some requests are
done after 2-10ms while some take one or several seconds. This is true no
matter which node the requests are made to, including when making the
request from the node itself. Specifying preference=_local/_primary doesn't
have any effect.
=== Question ===
I'd like any input on the cause of the slow GET requests. The cluster
obviously has a lot to deal with, but it seems strange that the exact same
request can take only a couple of milliseconds and then take seconds a very
short while later.
I'm particularly interested in whether the best course of action would be
to:
A) Increase server resources
B) Set a lower ttl time for B documents thereby drastically decreasing the
total number of documents for the cluster
C) Any configuration changes we should make
D) Whether the "corrupt" documents for which updates fails could be the
cause of the problem
Thanks in advance for any input!
Joel
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0afaeb9d-4b8b-450b-8fbc-9929b47b75b3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.