Slow Query Performance

We're seeing query performance that is surprisingly slow. Running 10 nodes
with 0.19.10 on AWS EC2. Each instance is an m1.xlarge with 15GB of RAM and
8x 60GB EBS volumes in RAID-0 for data storage.

ES_MIN_MEM=8192M
ES_MAX_MEM=14000M

We have 8 indexes with a grand total of roughly 110GB of data. Each index
has 20 shards and 3 replicas. The largest index is roughly 55GB and has 35
million records. Queries against this index typically take over 1.5 seconds:

---snip---
time curl -XGET
http://es.dev.example.com:9200/indexname/_search?pretty=true -d '{
"query" : {
"match" : {
"term" : {
"query" : "Stonehenge"
}
}
}
}'
{
"took" : 1385,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 545,
...
}
real 0m1.676s
user 0m0.011s
sys 0m0.006s
---snip---

In this case case, "term" is a stored string value. System load is low -
very little CPU utilizations and light disk I/O. I was really hoping we
would see simple queries take less than 250ms, so I'm pretty shocked by the
performance.

What things should we look at to determine why the queries are slow? Or any
strategies to improve performance?

Any advice would be greatly appreciated.

-Sean

--

Hey Sean,

On similar indices this query returns in like 10ms or less. Can you give me
more information how much data you are returning (the fields), are you
doing highlighting etc? I also wonder how big you documents are and what
search type are you running ie QueryThenFetch ?

simon

On Thursday, October 18, 2012 2:06:53 AM UTC+2, VegHead wrote:

We're seeing query performance that is surprisingly slow. Running 10 nodes
with 0.19.10 on AWS EC2. Each instance is an m1.xlarge with 15GB of RAM and
8x 60GB EBS volumes in RAID-0 for data storage.

ES_MIN_MEM=8192M
ES_MAX_MEM=14000M

We have 8 indexes with a grand total of roughly 110GB of data. Each index
has 20 shards and 3 replicas. The largest index is roughly 55GB and has 35
million records. Queries against this index typically take over 1.5 seconds:

---snip---
time curl -XGET
http://es.dev.example.com:9200/indexname/_search?pretty=true -d '{
"query" : {
"match" : {
"term" : {
"query" : "Stonehenge"
}
}
}
}'
{
"took" : 1385,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 545,
...
}
real 0m1.676s
user 0m0.011s
sys 0m0.006s
---snip---

In this case case, "term" is a stored string value. System load is low -
very little CPU utilizations and light disk I/O. I was really hoping we
would see simple queries take less than 250ms, so I'm pretty shocked by the
performance.

What things should we look at to determine why the queries are slow? Or
any strategies to improve performance?

Any advice would be greatly appreciated.

-Sean

--

Heya

To add to what Simon said:

On Thursday, October 18, 2012 2:06:53 AM UTC+2, VegHead wrote:
We're seeing query performance that is surprisingly slow.
Running 10 nodes with 0.19.10 on AWS EC2. Each instance is an
m1.xlarge with 15GB of RAM and 8x 60GB EBS volumes in RAID-0
for data storage.

    ES_MIN_MEM=8192M
    ES_MAX_MEM=14000M

You really want to configure ES_HEAP_SIZE = 60% of total memory,
because:

  • you want to lock the heap into memory at startup - no fragmentation
    and no swapping. make sure you have ulimit -l unlimited, and
    bootstrap.mlockall: 1
  • you want to leave at least 40% of your memory to kernel filesystem
    caches
    We have 8 indexes with a grand total of roughly 110GB of data.
    Each index has 20 shards and 3 replicas. The largest index is
    roughly 55GB and has 35 million records. Queries against this
    index typically take over 1.5 seconds:

That's 64 shards per node! And your nodes are on the smallish side!

Why do you have 20 primary shards per index? That's a lot, esp for an
average of 600MB per primary shard. And 3 replicas? You're using a lot
of memory just keeping shards alive.

I'd reduce the number of primaries to eg 5 each. And turn the number of
replicas down to 2 or even 1 (although in EC2, you do run an increased
risk of multiple machine failure).

    In this case case, "term" is a stored string value. 

No reason for it to be stored. That has nothing to do with the ability
to search on it.

clint

--

Hi,
As far as search type goes, the example I showed up was just using curl...
so... whatever Elasticsearch defaults to. We haven't explicitly defined any
default search type, so I don't know what search type it's using.

I should have included a example of the data... it's pretty small. Here's
an example:

---snip---
{
"_index" : "ideas",
"_type" : "idea",
"_id" : "v6nZyBjnTOmoY8ocFgYUcw",
"_score" : 12.482641, "_source" :
{"networks":38777,"creator_last_name":"Tan","definition":"Example of an
archeological site (Neolithic origination, finished in the Bronze Age);
located in
England","time_created":"2012-05-22T00:25:05.090Z","network_name":"Blah
blah
Blahl","original_card_id":81897377,"frequency":1,"families":293466,"creator_id":1112643,"creator_first_name":"Howard","authors":1112643,"idea_signature":"-23795688954730064,-3778677691914660","num_views":0,"time_updated":"2012-05-22T00:25:05.107Z","network_id":38777,"term":"Stonehenge","original_document_id":3018120,"term_signature":"-23795688954730064","media":"http://example.com/images/stonehenge1334597688387.jpg","image_fill":"1"}
}
---snip---

No highlighting.

-Sean

On Thursday, October 18, 2012 12:53:31 AM UTC-7, simonw wrote:

Hey Sean,

On similar indices this query returns in like 10ms or less. Can you give
me more information how much data you are returning (the fields), are you
doing highlighting etc? I also wonder how big you documents are and what
search type are you running ie QueryThenFetch ?

simon

On Thursday, October 18, 2012 2:06:53 AM UTC+2, VegHead wrote:

We're seeing query performance that is surprisingly slow. Running 10
nodes with 0.19.10 on AWS EC2. Each instance is an m1.xlarge with 15GB of
RAM and 8x 60GB EBS volumes in RAID-0 for data storage.

ES_MIN_MEM=8192M
ES_MAX_MEM=14000M

We have 8 indexes with a grand total of roughly 110GB of data. Each index
has 20 shards and 3 replicas. The largest index is roughly 55GB and has 35
million records. Queries against this index typically take over 1.5 seconds:

---snip---
time curl -XGET
http://es.dev.example.com:9200/indexname/_search?pretty=true -d '{
"query" : {
"match" : {
"term" : {
"query" : "Stonehenge"
}
}
}
}'
{
"took" : 1385,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 545,
...
}
real 0m1.676s
user 0m0.011s
sys 0m0.006s
---snip---

In this case case, "term" is a stored string value. System load is low -
very little CPU utilizations and light disk I/O. I was really hoping we
would see simple queries take less than 250ms, so I'm pretty shocked by the
performance.

What things should we look at to determine why the queries are slow? Or
any strategies to improve performance?

Any advice would be greatly appreciated.

-Sean

--

Hi Clinton,

On Thursday, October 18, 2012 1:09:53 AM UTC-7, Clinton Gormley wrote:

To add to what Simon said:

On Thursday, October 18, 2012 2:06:53 AM UTC+2, VegHead wrote:
We're seeing query performance that is surprisingly slow.
Running 10 nodes with 0.19.10 on AWS EC2. Each instance is an
m1.xlarge with 15GB of RAM and 8x 60GB EBS volumes in RAID-0
for data storage.

    ES_MIN_MEM=8192M 
    ES_MAX_MEM=14000M 

You really want to configure ES_HEAP_SIZE = 60% of total memory,
because:

  • you want to lock the heap into memory at startup - no fragmentation
    and no swapping. make sure you have ulimit -l unlimited, and
    bootstrap.mlockall: 1
  • you want to leave at least 40% of your memory to kernel filesystem
    caches

Ahhh. Very useful to know. So when we're doing capacity planning, we should
also assume that only 60% of the memory on a node is actually available for
storing indexes?

    We have 8 indexes with a grand total of roughly 110GB of data. 
    Each index has 20 shards and 3 replicas. The largest index is 
    roughly 55GB and has 35 million records. Queries against this 
    index typically take over 1.5 seconds: 

That's 64 shards per node! And your nodes are on the smallish side!

Why do you have 20 primary shards per index? That's a lot, esp for an
average of 600MB per primary shard. And 3 replicas? You're using a lot
of memory just keeping shards alive.

I'd reduce the number of primaries to eg 5 each. And turn the number of
replicas down to 2 or even 1 (although in EC2, you do run an increased
risk of multiple machine failure).

We have 20 shards per index specifically for future growth and scalability.
I/O performance on EBS volumes tends to be the killer problem with EC2.
Having only 5 shards per index dramatically limits our ability to scale
horizontally. That's a large part of why we picked 20 shards specifically.
Why would going down to 5 primaries help?

We could reduce the number of replicas - 3 is probably overkill. However,
search functionality is pretty critical within our products. I'd rather pay
a little extra to increase reliability and we do have EC2 instances fail
regularly. And having more replicas makes it easier to scale up/down to
meet demand. So we could go down to 2 replicas, but I'd prefer not to go
down to only 1 replica.

So are you saying that you think that we don't have enough RAM on each
node? I can see that might be possible, but I don't see a blip on I/O stats
on servers when performing queries.

    In this case case, "term" is a stored string value. 

No reason for it to be stored. That has nothing to do with the ability
to search on it.

I didn't think so, but I wasn't sure.

-Sean

--

No. You should measure the working and peak memory footprints of each
release. You should then add some factor, set min and max equal and
.mlockall. Did it ever occur to anyone that he is swapping indices
indiscriminately? In Lucene 4.x you will be able to swap 128 byte segments
this way. You want your JVM to ever resident in memory.

You will also need to set the user limits for the user running ES to
unlimited files and memory. Just to make it simple.

g'luck,,

On Thursday, October 18, 2012 4:04:28 PM UTC-4, VegHead wrote:

Hi Clinton,

On Thursday, October 18, 2012 1:09:53 AM UTC-7, Clinton Gormley wrote:

To add to what Simon said:

On Thursday, October 18, 2012 2:06:53 AM UTC+2, VegHead wrote:
We're seeing query performance that is surprisingly slow.
Running 10 nodes with 0.19.10 on AWS EC2. Each instance is an
m1.xlarge with 15GB of RAM and 8x 60GB EBS volumes in RAID-0
for data storage.

    ES_MIN_MEM=8192M 
    ES_MAX_MEM=14000M 

You really want to configure ES_HEAP_SIZE = 60% of total memory,
because:

  • you want to lock the heap into memory at startup - no fragmentation
    and no swapping. make sure you have ulimit -l unlimited, and
    bootstrap.mlockall: 1
  • you want to leave at least 40% of your memory to kernel filesystem
    caches

Ahhh. Very useful to know. So when we're doing capacity planning, we
should also assume that only 60% of the memory on a node is actually
available for storing indexes?

    We have 8 indexes with a grand total of roughly 110GB of data. 
    Each index has 20 shards and 3 replicas. The largest index is 
    roughly 55GB and has 35 million records. Queries against this 
    index typically take over 1.5 seconds: 

That's 64 shards per node! And your nodes are on the smallish side!

Why do you have 20 primary shards per index? That's a lot, esp for an
average of 600MB per primary shard. And 3 replicas? You're using a lot
of memory just keeping shards alive.

I'd reduce the number of primaries to eg 5 each. And turn the number of
replicas down to 2 or even 1 (although in EC2, you do run an increased
risk of multiple machine failure).

We have 20 shards per index specifically for future growth and
scalability. I/O performance on EBS volumes tends to be the killer problem
with EC2. Having only 5 shards per index dramatically limits our ability to
scale horizontally. That's a large part of why we picked 20 shards
specifically. Why would going down to 5 primaries help?

We could reduce the number of replicas - 3 is probably overkill. However,
search functionality is pretty critical within our products. I'd rather pay
a little extra to increase reliability and we do have EC2 instances fail
regularly. And having more replicas makes it easier to scale up/down to
meet demand. So we could go down to 2 replicas, but I'd prefer not to go
down to only 1 replica.

So are you saying that you think that we don't have enough RAM on each
node? I can see that might be possible, but I don't see a blip on I/O stats
on servers when performing queries.

    In this case case, "term" is a stored string value. 

No reason for it to be stored. That has nothing to do with the ability
to search on it.

I didn't think so, but I wasn't sure.

-Sean

--

Hi Sean

We have 20 shards per index specifically for future growth and
scalability. I/O performance on EBS volumes tends to be the killer
problem with EC2. Having only 5 shards per index dramatically limits
our ability to scale horizontally. That's a large part of why we
picked 20 shards specifically. Why would going down to 5 primaries
help?

This is known as the Kagillion Shards solution to capacity planning :slight_smile:

You should spend some time watching this video of Kimchy at Berlin
Buzzwords 2012, where he talks about patterns for scaling:

clint

--

On Friday, October 19, 2012 3:24:45 AM UTC-7, Clinton Gormley wrote:

We have 20 shards per index specifically for future growth and
scalability. I/O performance on EBS volumes tends to be the killer
problem with EC2. Having only 5 shards per index dramatically limits
our ability to scale horizontally. That's a large part of why we
picked 20 shards specifically. Why would going down to 5 primaries
help?

This is known as the Kagillion Shards solution to capacity planning :slight_smile:

You should spend some time watching this video of Kimchy at Berlin
Buzzwords 2012, where he talks about patterns for scaling:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Okay. Read the slides. Watched most of the video. Doing better, but still
confused. :slight_smile:

Key take aways from this discussion and the presentation:

  • Data should fit in RAM
  • Memory allocated to ES should be 60% of available RAM (e.g. 9GB on a 15GB
    EC2 instance)
  • Replicas really only provide search benefits with larger number of
    instances (e.g. instance count > shard count).
  • Shards are not free - there are costs associated with each shard.
  • Kagillion shards are actually expensive to search across

By reducing our replica count from 3 to 1, we reduce the memory
requirements per instance dramatically. That, in turn, improved query
performance by a factor of 2-5 the first time we run a query. Queries that
were taking 1.5s to 3.0s now take 250ms to 700ms. Once the data is cached,
those same queries run in under 25ms. We'll see how that performs over
time, but that's a damn good improvement.

However, I'm still stumped on setting shard count. The problem is three
fold. First, our usage is highly cyclical. It varies throughout the day,
but it also varies during the year. Since we're focused on students, we see
4x to 10x spikes in usage around finals (December & May). The ideal
configuration now is not necessarily the ideal configuration during finals.
Second, our user base is rapidly increasing, so search utilization is
increasing as well. Third, our data doubles every couple months. If we set
the shard count to the "right" size now, it's going to be wrong in the
not-too-distant future... which means we'll have a hassle on our hands. We
definitely want to over allocate so we can avoid having to do a massive
reindex every couple months.

That said, I'm still confused on what Shay means "just use one shard and
start loading it". How do I know when I have hit the "limit"?

-Sean

--

Hi,

That said, I'm still confused on what Shay means "just use one shard and
start loading it". How do I know when I have hit the "limit"?

When at some query rate the response latency goes over your desired max.
And before that you can start watching key metrics, such as CPU usage, disk
IO, IO waits, memory usage/swapping... (check my sig).

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Friday, October 19, 2012 6:28:44 PM UTC-4, VegHead wrote:

On Friday, October 19, 2012 3:24:45 AM UTC-7, Clinton Gormley wrote:

We have 20 shards per index specifically for future growth and
scalability. I/O performance on EBS volumes tends to be the killer
problem with EC2. Having only 5 shards per index dramatically limits
our ability to scale horizontally. That's a large part of why we
picked 20 shards specifically. Why would going down to 5 primaries
help?

This is known as the Kagillion Shards solution to capacity planning :slight_smile:

You should spend some time watching this video of Kimchy at Berlin
Buzzwords 2012, where he talks about patterns for scaling:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Okay. Read the slides. Watched most of the video. Doing better, but still
confused. :slight_smile:

Key take aways from this discussion and the presentation:

  • Data should fit in RAM
  • Memory allocated to ES should be 60% of available RAM (e.g. 9GB on a
    15GB EC2 instance)
  • Replicas really only provide search benefits with larger number of
    instances (e.g. instance count > shard count).
  • Shards are not free - there are costs associated with each shard.
  • Kagillion shards are actually expensive to search across

By reducing our replica count from 3 to 1, we reduce the memory
requirements per instance dramatically. That, in turn, improved query
performance by a factor of 2-5 the first time we run a query. Queries that
were taking 1.5s to 3.0s now take 250ms to 700ms. Once the data is cached,
those same queries run in under 25ms. We'll see how that performs over
time, but that's a damn good improvement.

However, I'm still stumped on setting shard count. The problem is three
fold. First, our usage is highly cyclical. It varies throughout the day,
but it also varies during the year. Since we're focused on students, we see
4x to 10x spikes in usage around finals (December & May). The ideal
configuration now is not necessarily the ideal configuration during finals.
Second, our user base is rapidly increasing, so search utilization is
increasing as well. Third, our data doubles every couple months. If we set
the shard count to the "right" size now, it's going to be wrong in the
not-too-distant future... which means we'll have a hassle on our hands. We
definitely want to over allocate so we can avoid having to do a massive
reindex every couple months.

That said, I'm still confused on what Shay means "just use one shard and
start loading it". How do I know when I have hit the "limit"?

-Sean

--

By reducing our replica count from 3 to 1, we reduce the memory
requirements per instance dramatically. That, in turn, improved query
performance by a factor of 2-5 the first time we run a query. Queries
that were taking 1.5s to 3.0s now take 250ms to 700ms. Once the data
is cached, those same queries run in under 25ms. We'll see how that
performs over time, but that's a damn good improvement.

Indeed - nice result!

However, I'm still stumped on setting shard count. The problem is
three fold. First, our usage is highly cyclical. It varies throughout
the day, but it also varies during the year. Since we're focused on
students, we see 4x to 10x spikes in usage around finals (December &
May). The ideal configuration now is not necessarily the ideal
configuration during finals. Second, our user base is rapidly
increasing, so search utilization is increasing as well. Third, our
data doubles every couple months. If we set the shard count to the
"right" size now, it's going to be wrong in the not-too-distant
future... which means we'll have a hassle on our hands. We definitely
want to over allocate so we can avoid having to do a massive reindex
every couple months.

This is the point at which you need to decide on which model best fits
your data. I'm guessing that the type of data you have is not suited to
time-based indices (although I don't know anything about what you do, so
I may be wrong).

So that leaves you with the index-per-user model. This is a very
flexible way of managing your data, but does require a bit of
forethought.

The key to this is: querying 2 indices with 5 shards each is the same as
querying 5 indices with 2 shards each. Either way you query 10 shards
in total.

Aliases can point at multiple indices. This means that you can add new
indices with more capacity as and when you need to, and just update the
aliases to include those indices.

To map out a scenario:

  1. Create one index with 5 shards

  2. Decide on what constitutes a "user"
    A user can be any grouping of your data. For instance it may be per
    actual user, per country, whatever.

  3. Create a new_doc alias per "user".

    CRUD calls (get, index, etc) can only be performed on aliases
    that point at a single index.

    So you can use your single-index new_doc alias to create new
    documents. When updating existing docs, use the real _index (and
    _routing) rather than an alias.

  4. Create a query alias per "user".

    Queries can be run on aliases that point at multiple indices.
    Your query alias can point to the same index as the new_doc alias,
    plus any older indices relevant to the "user"

This way, as you need to grow, you can add more indices and incorporate
them into the bigger scheme. (Note: I've left out all the alias
routing/filtering which is discussed in the video)

clint

--