We've run some more tests, letting a series of queries run by scripts.
Observing BigDesk did not show CPU being overly loaded, except for the odd
spike which is probably GC related. After a few queries in the test
sequences, 1-word queries returned within 4-5ms on average, which is to be
expected. Another run using 5-sentence queries returned within 300ms on
average, which is still good considering the query term count.
Since CPU load does not seem to be an issue, it looks a lot like our
queries are indeed IO bound, at least in a "cold" scenario. As to the
question of storage type: all used EC2 instances are shown as "high
performance" IO, which Amazon say is SSD storage.
Heap memory looks fine as well, it's set to just about half the total
memory on each machine (as described in the original post). Memory usage
hardly changed at all during testing. GC runs frequently, but it does not
seem to have any consistent impact on search performance.
Do you have any suggestions how to keep all segments warmed at any time, so
we can avoid the initial spiky response times? The warmer API is obviously
a good start. However, we would have to ensure that the combination of
warming queries and contained terms would hit all available shards and
segments. I suspect a match_all query wouldn't work, as there is no actual
scoring involved and Lucene would take a shortcut to just return documents
and never even touch the indirect term index as in usual search scenarios.
Another aspect which I hadn't mentioned before is that search load on our
cluster is currently rather low. Most operations on ES are simply filters
and we are only now fully leveraging actual search and scoring. In full
swing of the application, with lots of searches going on, I would expect an
automatically warmed up system most of the time. However, we would like to
ensure fast search times 24/7, not just in peak periods. The user
experience matters all the time.
We've thought about adjusting the merge policy to have fewer segments which
appears to yield better search performance (disk caching related?). Then
again, we need to rely on a 1 second refresh interval and merges would
become quite costly. Any suggestions on this?
Thanks for your inputs and attention.
Am Dienstag, 15. Januar 2013 16:16:59 UTC+1 schrieb Igor Motov:
I think warmer might help. You are correct that because your queries are
changing, elasticsearch caching will not help. However, filesystem caching
might be really useful here. From you description, it sounds like your
queries are IO bound. So, it would be reasonable to try improving disk IO.
Where do you store your indices? Is it EBS, striped EBS, ephemeral, SSD?
On Tuesday, January 15, 2013 12:02:20 AM UTC-5, Otis Gospodnetic wrote:
Interesting. I don't follow how reducing replication will help...
Are your 1-word queries CPU or disk IO bound?
How does the latency change when you repeatedly search for the same word
over and over? (assuming no concurrent indexing and no rapid index
refreshing, just for now)
How large is your heap and how is the JVM/GC doing?
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
On Monday, January 14, 2013 10:47:29 AM UTC-5, Stefan Rietberger wrote:
Thank you and Radu for your inputs, much appreciated.
We've run some further tests and the filters are most definatively not
the cause of the long response times. We may still optimize our queries
given your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.
I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
Our current approach is trying to reduce the replication level from 3 to
1 and adjusting merge policies for search performance. From some research
on this mailing list we've gathered that there's a lot of potential in
these settings. Unfortunately, the application's use case requires a very
tight refresh rate, so that's off the table. If you can think of anything
more, we'd be glad to hear it.
Thanks again and best regards,
Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:
Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.
Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.
If you want day resolution in dates, integers representing the days
would help much, this reduces the cache size and is faster in range query
If you want to organize your docs by day, an idea is creating indices
on a per day basis, and select them by index name ("myindexYYYYMMDD")
including the involved indices by index name wildcards. By using aliasing,
the many day indices could be routed to a single physical index.
If you just want to sort docs by day, think about using a day counter
as static document boost. No more range queries needed.
And yes, using mixed size nodes does not help ES. ES overall speed will
be determined by the smallest, slowest node.