Search performance issues, non-cacheable cases

Hi all,

We are experiencing unusually long response times (~500ms) for just
one-term queries on our current setup (details below). In our use case,
query caching is rarely ever going to help, as we generate query contents
regularly per user (i.e. similar to pre-generated "more like this"). The
application on top of the search system demands very fast response times
(<100ms incl. further processing), especially for first requests.

The queries from this gist https://gist.github.com/4494156 take >20s for
query 1 and >11s for query 2 when first executed on our live elasticsearch
setup. Reducing the query text to just one word still yields first request
response times of ~1s. Further reducing complexity, by only querying a
single field instead of three, yields the aforementioned ~500ms. Again,
warming up the queries is not an option (although response times are
excellent afterwards, as expected).

Cluster setup:

  • Nodes: 3
    • Amazon EC2 instances
    • 1x M1 Extra Large Instance

      • 15 GB memory (9 GB for JVM)
      • 8 EC2 Compute Units (4 cores)
    • 2x High-Memory Double Extra Large Instance

      • 34.2 GB of memory (15 GB for JVM)
      • 13 EC2 Compute Units (4 cores)
    • Shards: 5
  • Replication level: 3

Index data:

  • Primary shard size: 30.8GB
  • Number of documents: ~40 million
  • Not optimized
  • Refresh rate: default

Is there anything obviously wrong with either the queries or the setup? Is
there any way to optimize either for "cold" queries?

If you need further information, please let me know. Thank you in advance
for any input.

Best regards,
Stefan

--

Hi Stefan,

The only "wrong" thing I see with your setup is that your nodes are
different. So I would assume the slower node will become a bottleneck at
one point.

But I think the main reason for "cold" queries being slow is the fact that
your OS caches are cold. On the ES side, queries aren't cached, plus your
"and" filters are not cached by default either. So performance gains on
your subsequent queries should be solely from the OS caches (field caches
are not used since you don't sort or facet).

If I didn't miss something, you can use the Warmers API (if you're on
0.20+), to load your caches after starting ES:

If you still have performance issues, I think the first step is to monitor
your cluster and observe what's the bottleneck. Our SPM is a good tool to
do that:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 9, 2013 at 6:12 PM, Stefan Rietberger gilessem@gmail.comwrote:

Hi all,

We are experiencing unusually long response times (~500ms) for just
one-term queries on our current setup (details below). In our use case,
query caching is rarely ever going to help, as we generate query contents
regularly per user (i.e. similar to pre-generated "more like this"). The
application on top of the search system demands very fast response times
(<100ms incl. further processing), especially for first requests.

The queries from this gist https://gist.github.com/4494156 take >20s
for query 1 and >11s for query 2 when first executed on our live
elasticsearch setup. Reducing the query text to just one word still yields
first request response times of ~1s. Further reducing complexity, by only
querying a single field instead of three, yields the aforementioned
~500ms. Again, warming up the queries is not an option (although response
times are excellent afterwards, as expected).

Cluster setup:

  • Nodes: 3
    • Amazon EC2 instances
    • 1x M1 Extra Large Instance

      • 15 GB memory (9 GB for JVM)
      • 8 EC2 Compute Units (4 cores)
    • 2x High-Memory Double Extra Large Instance

      • 34.2 GB of memory (15 GB for JVM)
      • 13 EC2 Compute Units (4 cores)
    • Shards: 5
  • Replication level: 3

Index data:

  • Primary shard size: 30.8GB
  • Number of documents: ~40 million
  • Not optimized
  • Refresh rate: default

Is there anything obviously wrong with either the queries or the setup? Is
there any way to optimize either for "cold" queries?

If you need further information, please let me know. Thank you in advance
for any input.

Best regards,
Stefan

--

--

Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days would
help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices on a
per day basis, and select them by index name ("myindexYYYYMMDD") including
the involved indices by index name wildcards. By using aliasing, the many
day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter as
static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed will be
determined by the smallest, slowest node.

Jörg

--

Thank you and Radu for your inputs, much appreciated.

We've run some further tests and the filters are most definatively not the
cause of the long response times. We may still optimize our queries given
your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.

I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
help, unfortunately.

Our current approach is trying to reduce the replication level from 3 to 1
and adjusting merge policies for search performance. From some research on
this mailing list we've gathered that there's a lot of potential in these
settings. Unfortunately, the application's use case requires a very tight
refresh rate, so that's off the table. If you can think of anything more,
we'd be glad to hear it.

Thanks again and best regards,
Stefan

Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:

Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days would
help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices on a
per day basis, and select them by index name ("myindexYYYYMMDD") including
the involved indices by index name wildcards. By using aliasing, the many
day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter as
static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed will be
determined by the smallest, slowest node.

Jörg

--

Hi,

Interesting. I don't follow how reducing replication will help...

Are your 1-word queries CPU or disk IO bound?
How does the latency change when you repeatedly search for the same word
over and over? (assuming no concurrent indexing and no rapid index
refreshing, just for now)
How large is your heap and how is the JVM/GC doing?

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Monday, January 14, 2013 10:47:29 AM UTC-5, Stefan Rietberger wrote:

Thank you and Radu for your inputs, much appreciated.

We've run some further tests and the filters are most definatively not the
cause of the long response times. We may still optimize our queries given
your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.

I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
help, unfortunately.

Our current approach is trying to reduce the replication level from 3 to 1
and adjusting merge policies for search performance. From some research on
this mailing list we've gathered that there's a lot of potential in these
settings. Unfortunately, the application's use case requires a very tight
refresh rate, so that's off the table. If you can think of anything more,
we'd be glad to hear it.

Thanks again and best regards,
Stefan

Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:

Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days would
help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices on
a per day basis, and select them by index name ("myindexYYYYMMDD")
including the involved indices by index name wildcards. By using aliasing,
the many day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter as
static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed will
be determined by the smallest, slowest node.

Jörg

--

I think warmer might help. You are correct that because your queries are
changing, elasticsearch caching will not help. However, filesystem caching
might be really useful here. From you description, it sounds like your
queries are IO bound. So, it would be reasonable to try improving disk IO.
Where do you store your indices? Is it EBS, striped EBS, ephemeral, SSD?

On Tuesday, January 15, 2013 12:02:20 AM UTC-5, Otis Gospodnetic wrote:

Hi,

Interesting. I don't follow how reducing replication will help...

Are your 1-word queries CPU or disk IO bound?
How does the latency change when you repeatedly search for the same word
over and over? (assuming no concurrent indexing and no rapid index
refreshing, just for now)
How large is your heap and how is the JVM/GC doing?

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Monday, January 14, 2013 10:47:29 AM UTC-5, Stefan Rietberger wrote:

Thank you and Radu for your inputs, much appreciated.

We've run some further tests and the filters are most definatively not
the cause of the long response times. We may still optimize our queries
given your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.

I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
help, unfortunately.

Our current approach is trying to reduce the replication level from 3 to
1 and adjusting merge policies for search performance. From some research
on this mailing list we've gathered that there's a lot of potential in
these settings. Unfortunately, the application's use case requires a very
tight refresh rate, so that's off the table. If you can think of anything
more, we'd be glad to hear it.

Thanks again and best regards,
Stefan

Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:

Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days
would help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices on
a per day basis, and select them by index name ("myindexYYYYMMDD")
including the involved indices by index name wildcards. By using aliasing,
the many day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter as
static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed will
be determined by the smallest, slowest node.

Jörg

--

Hi Otis,
Hi Igor,

We've run some more tests, letting a series of queries run by scripts.
Observing BigDesk did not show CPU being overly loaded, except for the odd
spike which is probably GC related. After a few queries in the test
sequences, 1-word queries returned within 4-5ms on average, which is to be
expected. Another run using 5-sentence queries returned within 300ms on
average, which is still good considering the query term count.

Since CPU load does not seem to be an issue, it looks a lot like our
queries are indeed IO bound, at least in a "cold" scenario. As to the
question of storage type: all used EC2 instances are shown as "high
performance" IO, which Amazon say is SSD storage.

Heap memory looks fine as well, it's set to just about half the total
memory on each machine (as described in the original post). Memory usage
hardly changed at all during testing. GC runs frequently, but it does not
seem to have any consistent impact on search performance.

Do you have any suggestions how to keep all segments warmed at any time, so
we can avoid the initial spiky response times? The warmer API is obviously
a good start. However, we would have to ensure that the combination of
warming queries and contained terms would hit all available shards and
segments. I suspect a match_all query wouldn't work, as there is no actual
scoring involved and Lucene would take a shortcut to just return documents
and never even touch the indirect term index as in usual search scenarios.

Another aspect which I hadn't mentioned before is that search load on our
cluster is currently rather low. Most operations on ES are simply filters
and we are only now fully leveraging actual search and scoring. In full
swing of the application, with lots of searches going on, I would expect an
automatically warmed up system most of the time. However, we would like to
ensure fast search times 24/7, not just in peak periods. The user
experience matters all the time.

We've thought about adjusting the merge policy to have fewer segments which
appears to yield better search performance (disk caching related?). Then
again, we need to rely on a 1 second refresh interval and merges would
become quite costly. Any suggestions on this?

Thanks for your inputs and attention.

Best regards,
Stefan

Am Dienstag, 15. Januar 2013 16:16:59 UTC+1 schrieb Igor Motov:

I think warmer might help. You are correct that because your queries are
changing, elasticsearch caching will not help. However, filesystem caching
might be really useful here. From you description, it sounds like your
queries are IO bound. So, it would be reasonable to try improving disk IO.
Where do you store your indices? Is it EBS, striped EBS, ephemeral, SSD?

On Tuesday, January 15, 2013 12:02:20 AM UTC-5, Otis Gospodnetic wrote:

Hi,

Interesting. I don't follow how reducing replication will help...

Are your 1-word queries CPU or disk IO bound?
How does the latency change when you repeatedly search for the same word
over and over? (assuming no concurrent indexing and no rapid index
refreshing, just for now)
How large is your heap and how is the JVM/GC doing?

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Monday, January 14, 2013 10:47:29 AM UTC-5, Stefan Rietberger wrote:

Thank you and Radu for your inputs, much appreciated.

We've run some further tests and the filters are most definatively not
the cause of the long response times. We may still optimize our queries
given your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.

I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
help, unfortunately.

Our current approach is trying to reduce the replication level from 3 to
1 and adjusting merge policies for search performance. From some research
on this mailing list we've gathered that there's a lot of potential in
these settings. Unfortunately, the application's use case requires a very
tight refresh rate, so that's off the table. If you can think of anything
more, we'd be glad to hear it.

Thanks again and best regards,
Stefan

Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:

Range queries on a fine grained date field (down to seconds) have their
drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days
would help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices
on a per day basis, and select them by index name ("myindexYYYYMMDD")
including the involved indices by index name wildcards. By using aliasing,
the many day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter
as static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed will
be determined by the smallest, slowest node.

Jörg

--

Hi Stefan,

it looks a lot like our queries are indeed IO bound, at least in a "cold"
scenario.

Can you see/confirm that in BigDesk or SPM for ES or anything else?

Have you looked at slow query log
(Elasticsearch Platform — Find real-time answers at scale | Elastic)?
That may point out some shards are slower than others and are thus the
bottleneck, for example.

Most operations on ES are simply filters

Any fields you filter on (or sort or facet) should be used in your warmup
queries. It's not clear to me if you are already doing this or not.

thought about adjusting the merge policy to have fewer segments which
appears to yield better search performance (disk caching related?).

I would think the opposite. If Lucene is merging segments more frequently,
the disk is doing more work and the FS cache for blocks from the old
segments that are being merged is being invalidated because after the merge
those segments/files/blocks are gone.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thursday, January 17, 2013 4:57:58 AM UTC-5, Stefan Rietberger wrote:

Hi Otis,
Hi Igor,

We've run some more tests, letting a series of queries run by scripts.
Observing BigDesk did not show CPU being overly loaded, except for the odd
spike which is probably GC related. After a few queries in the test
sequences, 1-word queries returned within 4-5ms on average, which is to be
expected. Another run using 5-sentence queries returned within 300ms on
average, which is still good considering the query term count.

Since CPU load does not seem to be an issue, it looks a lot like our
queries are indeed IO bound, at least in a "cold" scenario. As to the
question of storage type: all used EC2 instances are shown as "high
performance" IO, which Amazon say is SSD storage.

Heap memory looks fine as well, it's set to just about half the total
memory on each machine (as described in the original post). Memory usage
hardly changed at all during testing. GC runs frequently, but it does not
seem to have any consistent impact on search performance.

Do you have any suggestions how to keep all segments warmed at any time,
so we can avoid the initial spiky response times? The warmer API is
obviously a good start. However, we would have to ensure that the
combination of warming queries and contained terms would hit all available
shards and segments. I suspect a match_all query wouldn't work, as there is
no actual scoring involved and Lucene would take a shortcut to just return
documents and never even touch the indirect term index as in usual search
scenarios.

Another aspect which I hadn't mentioned before is that search load on our
cluster is currently rather low. Most operations on ES are simply filters
and we are only now fully leveraging actual search and scoring. In full
swing of the application, with lots of searches going on, I would expect an
automatically warmed up system most of the time. However, we would like to
ensure fast search times 24/7, not just in peak periods. The user
experience matters all the time.

We've thought about adjusting the merge policy to have fewer segments
which appears to yield better search performance (disk caching related?).
Then again, we need to rely on a 1 second refresh interval and merges would
become quite costly. Any suggestions on this?

Thanks for your inputs and attention.

Best regards,
Stefan

Am Dienstag, 15. Januar 2013 16:16:59 UTC+1 schrieb Igor Motov:

I think warmer might help. You are correct that because your queries are
changing, elasticsearch caching will not help. However, filesystem caching
might be really useful here. From you description, it sounds like your
queries are IO bound. So, it would be reasonable to try improving disk IO.
Where do you store your indices? Is it EBS, striped EBS, ephemeral, SSD?

On Tuesday, January 15, 2013 12:02:20 AM UTC-5, Otis Gospodnetic wrote:

Hi,

Interesting. I don't follow how reducing replication will help...

Are your 1-word queries CPU or disk IO bound?
How does the latency change when you repeatedly search for the same word
over and over? (assuming no concurrent indexing and no rapid index
refreshing, just for now)
How large is your heap and how is the JVM/GC doing?

Otis

ELASTICSEARCH Performance Monitoring -
Sematext Monitoring | Infrastructure Monitoring Service

On Monday, January 14, 2013 10:47:29 AM UTC-5, Stefan Rietberger wrote:

Thank you and Radu for your inputs, much appreciated.

We've run some further tests and the filters are most definatively not
the cause of the long response times. We may still optimize our queries
given your advice, but right now we are facing delays of half a second for
one-term queries, without any filters whatsoever. Using a natural language
question as a query takes a few seconds already, which is way too much,
even if stopwords were matching.

I've done some tests using the warmer API but since
filtering/sorting/faceting is not the cause of our issue, this did not
help, unfortunately.

Our current approach is trying to reduce the replication level from 3
to 1 and adjusting merge policies for search performance. From some
research on this mailing list we've gathered that there's a lot of
potential in these settings. Unfortunately, the application's use case
requires a very tight refresh rate, so that's off the table. If you can
think of anything more, we'd be glad to hear it.

Thanks again and best regards,
Stefan

Am Sonntag, 13. Januar 2013 11:11:48 UTC+1 schrieb Jörg Prante:

Range queries on a fine grained date field (down to seconds) have
their drawbacks. It is very slow on ~40 million docs.

Range queries force to load all values of a field into the cache. You
observe this at the first time the query is sent, and your large RAM helps
you to compensate this.

If you want day resolution in dates, integers representing the days
would help much, this reduces the cache size and is faster in range query
computation.

If you want to organize your docs by day, an idea is creating indices
on a per day basis, and select them by index name ("myindexYYYYMMDD")
including the involved indices by index name wildcards. By using aliasing,
the many day indices could be routed to a single physical index.

If you just want to sort docs by day, think about using a day counter
as static document boost. No more range queries needed.

And yes, using mixed size nodes does not help ES. ES overall speed
will be determined by the smallest, slowest node.

Jörg

--