ElasticSearch search performance question

I know this is difficult to answer, the real answer is always "It Depends"
:slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks (although
    NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index ever,
    it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue length
    of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec ? I
have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc - and
can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Term filters are useful for clauses that have a high hit rate in terms of reuse across queries. Given you are searching for what I imagine are unique records using low-frequency terms I expect the following to be true of the bitsets being cached by these filters:

  1. they are rarely reused and therefore frequently evicted
  2. are very wasteful eg have millions of bits in each set with only one bit having been set to 1.

For these reasons it might be worth experimenting with term queries and not filters.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5756599d-17d5-46f3-b60c-e21c380c1fe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues with
java <1.7u55, and you will always see performance boosts running the latest
version of ES.

That aside, what is your current resource utilisation like? Are you seeing
lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian jay.danielian@circleback.com
wrote:

I know this is difficult to answer, the real answer is always "It Depends"
:slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks (although
    NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index ever,
    it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue length
    of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec ?
I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc - and
can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_JtmLUJg7W3bc_7t%3Dti9Bb%2B7YFOnRzQQ6cbNAEKh2SMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark,

Thanks for the initial reply. Yes, your assumption about these things being
very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on the
server is CPU. But that makes me think that there is something in my setup
, query syntax, or perhaps the cache eviction rate, etc that is causing it
to spike so high. I also have concerns about non RAID 0 the EBS volumes, as
I know that having one large volume does not maximize throughput - however,
just looking at the stats it doesn't seem like IO is really a bottleneck.

Here is a sample query structure
=> https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline several of
these queries in one batch. The queries also include custom routing / route
key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues with
java <1.7u55, and you will always see performance boosts running the latest
version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian <jay.da...@circleback.com
<javascript:>> wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks (although
    NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index ever,
    it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue length
    of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec ?
I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc -
and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You might want to try hitting hot threads while putting your load on it and
seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <jay.danielian@circleback.com

wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on the
server is CPU. But that makes me think that there is something in my setup
, query syntax, or perhaps the cache eviction rate, etc that is causing it
to spike so high. I also have concerns about non RAID 0 the EBS volumes, as
I know that having one large volume does not maximize throughput - however,
just looking at the stats it doesn't seem like IO is really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline several of
these queries in one batch. The queries also include custom routing / route
key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues with
java <1.7u55, and you will always see performance boosts running the latest
version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian jay.da...@circleback.com
wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec
? I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc -
and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1dKPYPr-g6Dpa9nHVoSt42tkvBuJfY5%3DGUGseTVwJSAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

As requested here is a dump of the hot threads output.

Thanks!

J

On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett wrote:

You might want to try hitting hot threads while putting your load on it
and seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <jay.da...@circleback.com
<javascript:>> wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on the
server is CPU. But that makes me think that there is something in my setup
, query syntax, or perhaps the cache eviction rate, etc that is causing it
to spike so high. I also have concerns about non RAID 0 the EBS volumes, as
I know that having one large volume does not maximize throughput - however,
just looking at the stats it doesn't seem like IO is really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline several
of these queries in one batch. The queries also include custom routing /
route key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues with
java <1.7u55, and you will always see performance boosts running the latest
version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian jay.da...@circleback.com
wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how can
we increase the throughput and decrease the response time for the ES
queries? is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec
? I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked
at many slideshares, blog posts from found.no, elasticseearch.org, etc

  • and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/476bb853-85d6-4cd5-81f7-a31d19a09c7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How many replicas do you have configured for the index?

Christian

On Thursday, February 12, 2015 at 8:32:28 PM UTC, Jay Danielian wrote:

I know this is difficult to answer, the real answer is always "It Depends"
:slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks (although
    NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index ever,
    it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue length
    of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec ?
I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc - and
can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fbca444d-34f1-447e-a55f-fb4218d571cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

we have three nodes, 6 shards total with each node having 1 replica. Here
are the settings for the index:

"index" : {
"number_of_replicas" : "1",
"number_of_shards" : "6",
"refresh_interval" : "60",
"version" : {
"created" : "1030399"
},
"merge" : {
"policy" : {
"merge_factor" : "30"
}
}
}

On Friday, February 13, 2015 at 9:40:20 AM UTC-5,
christian...@elasticsearch.com wrote:

How many replicas do you have configured for the index?

Christian

On Thursday, February 12, 2015 at 8:32:28 PM UTC, Jay Danielian wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks (although
    NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index ever,
    it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue length
    of 1000

Assuming no stemming, free text queries - just term matching, how can we
increase the throughput and decrease the response time for the ES queries?
is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests / sec ?
I have read that scaling out is better for ES vs scaling up. But it feels
that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked at
many slideshares, blog posts from found.no, elasticseearch.org, etc -
and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a6b1358-2928-4963-8c1a-fe2eacae0d67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

So I can see in the hot threads dump the initialization requests for those
FixedBitSets I was talking about.
Looking at the number of docs in your index I estimate each Term to be
allocating 140mb of memory in total for all these bitsets across all shards
given the >1bn docs in your index. Remember that you are probably setting
only a single bit in each of these large structures.
Another stat (if I read it correctly) shows >5m evictions of these cached
filters given their low reusability. It's fair to say you have some cache
churn going on :slight_smile:
Did you try my earlier suggestion of queries not filters?

On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:

As requested here is a dump of the hot threads output.

Thanks!

J

On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett wrote:

You might want to try hitting hot threads while putting your load on it
and seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian jay.da...@circleback.com
wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on the
server is CPU. But that makes me think that there is something in my setup
, query syntax, or perhaps the cache eviction rate, etc that is causing it
to spike so high. I also have concerns about non RAID 0 the EBS volumes, as
I know that having one large volume does not maximize throughput - however,
just looking at the stats it doesn't seem like IO is really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline several
of these queries in one batch. The queries also include custom routing /
route key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues
with java <1.7u55, and you will always see performance boosts running the
latest version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian jay.da...@circleback.com
wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that are
non-analyzed. We are using ES like a key value store, where once the match
is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how can
we increase the throughput and decrease the response time for the ES
queries? is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests /
sec ? I have read that scaling out is better for ES vs scaling up. But it
feels that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked
at many slideshares, blog posts from found.no, elasticseearch.org,
etc - and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc98dc73-77ea-4ff2-8ed5-1cbc5bc3fc28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks to all for these great suggestions. I haven't had a chance to change
the syntax yet, as that is a risky thing for me to quickly change against
our production setup. My plan is to try that this weekend (so I can
properly test the new syntax is returning the same results). However, is
there a way to turn filter caching off globally via config or elsewhere?

Thanks!

J

On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote:

So I can see in the hot threads dump the initialization requests for those
FixedBitSets I was talking about.
Looking at the number of docs in your index I estimate each Term to be
allocating 140mb of memory in total for all these bitsets across all shards
given the >1bn docs in your index. Remember that you are probably setting
only a single bit in each of these large structures.
Another stat (if I read it correctly) shows >5m evictions of these cached
filters given their low reusability. It's fair to say you have some cache
churn going on :slight_smile:
Did you try my earlier suggestion of queries not filters?

On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:

As requested here is a dump of the hot threads output.

Thanks!

J

On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett wrote:

You might want to try hitting hot threads while putting your load on it
and seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <jay.da...@circleback.com

wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on the
server is CPU. But that makes me think that there is something in my setup
, query syntax, or perhaps the cache eviction rate, etc that is causing it
to spike so high. I also have concerns about non RAID 0 the EBS volumes, as
I know that having one large volume does not maximize throughput - however,
just looking at the stats it doesn't seem like IO is really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline several
of these queries in one batch. The queries also include custom routing /
route key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues
with java <1.7u55, and you will always see performance boosts running the
latest version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian jay.da...@circleback.com
wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that
are non-analyzed. We are using ES like a key value store, where once the
match is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the index.
    Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how can
we increase the throughput and decrease the response time for the ES
queries? is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests /
sec ? I have read that scaling out is better for ES vs scaling up. But it
feels that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have looked
at many slideshares, blog posts from found.no, elasticseearch.org,
etc - and can't really pinpoint a way to improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6369791b-11ab-4139-aef6-ab3025da0606%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Just to update the thread.

I added code to disable cache on all the term filters we were using, and it
made a huge performance improvement. Now we are able to service the queries
with average response time under two seconds, which is excellent (we are
bundling several searches using _msearch, so < 2 seconds total response is
good) The search requests / sec metric is still peaking at around 600 /
sec, however our CPU "only" spikes to about 65% now - so I think we can add
more search threads to our config as we are no longer maxing out CPU. I
also see a a bit of disk read activity now, which against our non RAID EBS
drive - means we may be able to squeeze more if we switch disk setup.

It seems like having these filters add cache items was wasting CPU on cache
eviction and cache lookups (cache misses really) for each query - which
really only shows up when trying to push some load through.

Thanks for everyone's suggestions!!

J

On Friday, February 13, 2015 at 11:55:52 AM UTC-5, Jay Danielian wrote:

Thanks to all for these great suggestions. I haven't had a chance to
change the syntax yet, as that is a risky thing for me to quickly change
against our production setup. My plan is to try that this weekend (so I can
properly test the new syntax is returning the same results). However, is
there a way to turn filter caching off globally via config or elsewhere?

Thanks!

J

On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote:

So I can see in the hot threads dump the initialization requests for
those FixedBitSets I was talking about.
Looking at the number of docs in your index I estimate each Term to be
allocating 140mb of memory in total for all these bitsets across all shards
given the >1bn docs in your index. Remember that you are probably setting
only a single bit in each of these large structures.
Another stat (if I read it correctly) shows >5m evictions of these cached
filters given their low reusability. It's fair to say you have some cache
churn going on :slight_smile:
Did you try my earlier suggestion of queries not filters?

On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:

As requested here is a dump of the hot threads output.

Thanks!

J

On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett
wrote:

You might want to try hitting hot threads while putting your load on it
and seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <
jay.da...@circleback.com> wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on
the server is CPU. But that makes me think that there is something in my
setup , query syntax, or perhaps the cache eviction rate, etc that is
causing it to spike so high. I also have concerns about non RAID 0 the EBS
volumes, as I know that having one large volume does not maximize
throughput - however, just looking at the stats it doesn't seem like IO is
really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline
several of these queries in one batch. The queries also include custom
routing / route key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues
with java <1.7u55, and you will always see performance boosts running the
latest version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian <jay.da...@circleback.com

wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that
are non-analyzed. We are using ES like a key value store, where once the
match is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the
    index. Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how
can we increase the throughput and decrease the response time for the ES
queries? is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests /
sec ? I have read that scaling out is better for ES vs scaling up. But it
feels that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have
looked at many slideshares, blog posts from found.no,
elasticseearch.org, etc - and can't really pinpoint a way to
improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/47b93b84-
d929-4cad-becd-31581cd4c574%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d35d7a4-2add-4e29-957f-a573ab81d340%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Good stuff. You're seeing the benefits of not caching lots of single-use
BitSets.
Now if you swap queries for your filters then you'll also see the benefits
of not allocating multi-megabyte BitSets to hold what is typically a single
bit for each query that you run.

On Thursday, February 19, 2015 at 6:23:08 AM UTC, Jay Danielian wrote:

Just to update the thread.

I added code to disable cache on all the term filters we were using, and
it made a huge performance improvement. Now we are able to service the
queries with average response time under two seconds, which is excellent
(we are bundling several searches using _msearch, so < 2 seconds total
response is good) The search requests / sec metric is still peaking at
around 600 / sec, however our CPU "only" spikes to about 65% now - so I
think we can add more search threads to our config as we are no longer
maxing out CPU. I also see a a bit of disk read activity now, which against
our non RAID EBS drive - means we may be able to squeeze more if we switch
disk setup.

It seems like having these filters add cache items was wasting CPU on
cache eviction and cache lookups (cache misses really) for each query -
which really only shows up when trying to push some load through.

Thanks for everyone's suggestions!!

J

On Friday, February 13, 2015 at 11:55:52 AM UTC-5, Jay Danielian wrote:

Thanks to all for these great suggestions. I haven't had a chance to
change the syntax yet, as that is a risky thing for me to quickly change
against our production setup. My plan is to try that this weekend (so I can
properly test the new syntax is returning the same results). However, is
there a way to turn filter caching off globally via config or elsewhere?

Thanks!

J

On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote:

So I can see in the hot threads dump the initialization requests for
those FixedBitSets I was talking about.
Looking at the number of docs in your index I estimate each Term to be
allocating 140mb of memory in total for all these bitsets across all shards
given the >1bn docs in your index. Remember that you are probably setting
only a single bit in each of these large structures.
Another stat (if I read it correctly) shows >5m evictions of these
cached filters given their low reusability. It's fair to say you have some
cache churn going on :slight_smile:
Did you try my earlier suggestion of queries not filters?

On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:

As requested here is a dump of the hot threads output.

Thanks!

J

On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett
wrote:

You might want to try hitting hot threads while putting your load on
it and seeing what you see. Or posting it.

Nik

On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian <
jay.da...@circleback.com> wrote:

Mark,

Thanks for the initial reply. Yes, your assumption about these things
being very specific and thus not likely to have any re-use with regards to
caching is correct. I have attached some screenshots from the BigDesk
plugin which showed a decent snapshot of what the server looked like while
my tests were running. You can see the spikes in CPU, that essentially
covered the duration when the JMeter tests were running.

At a high level, the only thing that seems to be really stressed on
the server is CPU. But that makes me think that there is something in my
setup , query syntax, or perhaps the cache eviction rate, etc that is
causing it to spike so high. I also have concerns about non RAID 0 the EBS
volumes, as I know that having one large volume does not maximize
throughput - however, just looking at the stats it doesn't seem like IO is
really a bottleneck.

Here is a sample query structure =>
https://gist.github.com/jaydanielian/c2be885987f344031cfc

Also this is one query - in reality we use _msearch to pipeline
several of these queries in one batch. The queries also include custom
routing / route key to make sure we only hit one shard.

Thanks!

J

On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

It'd help if you could gist/pastebin/etc a query example.

Also your current ES and java need updating, there are known issues
with java <1.7u55, and you will always see performance boosts running the
latest version of ES.

That aside, what is your current resource utilisation like? Are you
seeing lots of cache evictions, high heap use, high CPU, IO delays?

On 13 February 2015 at 07:32, Jay Danielian <
jay.da...@circleback.com> wrote:

I know this is difficult to answer, the real answer is always "It
Depends" :slight_smile: But I am going to go ahead and hope I get some feedback here.

We are mainly using ES to issue terms searches against fields that
are non-analyzed. We are using ES like a key value store, where once the
match is found we parse the _source JSON and return our model. We are doing
contact lookups, searching against (last_name AND (phone_number OR email)).
We are issuing constant_score queries with term filters for the terms
mentioned above. No aggregations, no sorting, no scripts, etc. Using
JMeter, we were maxing out at around 500 search requests / sec. Average
request time was taking around 7 seconds to complete. When the test would
fire up, the ThreadPool Search Queue would spike to 1000 on each node and
CPU would be maxed out, then once it finished everything would return to
normal. So it appears healthy, and we wouldn't get any errors - just
nowhere close to the performance we are looking for.

Setup details

  • Index size 100GB with two different document mappings in the
    index. Roughly 500M documents
  • three nodes c3.4xl instances on EC2 using pIOPS SSD EBS disks
    (although NOT RAID 0 - just one big volume)
  • each server node on EC2 has 30GB RAM, 16GB on heap, rest for OS
  • we have set mlockall on our instances
  • 3 nodes are split into 6 shards for the main index
  • Index is read only after it is loaded - we don't update the index
    ever, it is only for querying
  • ES version 1.3.3 Java 1.7.0_51
  • each server has 16 cores / node and 48 search threads with queue
    length of 1000

Assuming no stemming, free text queries - just term matching, how
can we increase the throughput and decrease the response time for the ES
queries? is 500 requests / sec at the top end?
Do we just need many more servers if we really want 3000 requests /
sec ? I have read that scaling out is better for ES vs scaling up. But it
feels that the current server farm should deliver better performance.

Any help or tuning advice would be really appreciated. We have
looked at many slideshares, blog posts from found.no,
elasticseearch.org, etc - and can't really pinpoint a way to
improve our setup.

Thanks!

JD

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/47b93b84-
d929-4cad-becd-31581cd4c574%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47b93b84-d929-4cad-becd-31581cd4c574%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d6b37eb-7ec1-47d7-bcb4-df96203b1445%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af1bcccc-08be-4345-8aac-e456c6adb858%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.