Search performance

ppearcy · August 24, 2010, 7:06pm

Just wanted to post how ecstatic I am with search performance.

My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.

I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.

The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.

If I start a flood of documents indexing, things slow down a little,
but not much.

ES consumes ~10GB of JVM heap in this setup.

I'm assuming that the system is using the rest of the RAM for disk
cache.

Thanks for the great work!

Paul

ppearcy · August 25, 2010, 6:24am

And, of course, I was still bounded client side

Was able to ramp up to 1000 queries/sec with no warm up. I believe
that my indexes are pegged in system cache.

Really awesome.

On Aug 24, 1:06 pm, Paul ppea...@gmail.com wrote:

Just wanted to post how ecstatic I am with search performance.

My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.

I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.

The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.

If I start a flood of documents indexing, things slow down a little,
but not much.

ES consumes ~10GB of JVM heap in this setup.

I'm assuming that the system is using the rest of the RAM for disk
cache.

Thanks for the great work!

Paul

Stephen_Day_2 · August 25, 2010, 3:06pm

Is this a single machine (24 cores, 48 GB RAM) or are you clustered
across several machines (3 machines 16GB RAM, 8 cores each)?

On Tue, Aug 24, 2010 at 11:24 PM, Paul ppearcy@gmail.com wrote:

And, of course, I was still bounded client side

Was able to ramp up to 1000 queries/sec with no warm up. I believe
that my indexes are pegged in system cache.

Really awesome.

On Aug 24, 1:06 pm, Paul ppea...@gmail.com wrote:

Just wanted to post how ecstatic I am with search performance.

My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.

I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.

The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.

If I start a flood of documents indexing, things slow down a little,
but not much.

ES consumes ~10GB of JVM heap in this setup.

I'm assuming that the system is using the rest of the RAM for disk
cache.

Thanks for the great work!

Paul

ppearcy · August 25, 2010, 5:43pm

Hi Stephen,
This is a single 24 core, 48GB of RAM machine. And I have further
revised numbers of over 1500 queries per second, with only a handful
(less than 0.1%) running at over 2 seconds and an average running at
34ms.

To add a comparison point, our current legacy search system (which I
would name if I could) can pump through ~80 of the same queries
against the same data set per second on semi-similar h/w.

The really cool thing is that there just does not seem to be a
bottleneck in my config other than CPU. I was really expecting to have
to tweak things to work around memory or disk bottlnecks, but that has
not been the case.

Thanks,
Paul

On Aug 25, 9:06 am, Stephen Day stevv...@gmail.com wrote:

Is this a single machine (24 cores, 48 GB RAM) or are you clustered
across several machines (3 machines 16GB RAM, 8 cores each)?

On Tue, Aug 24, 2010 at 11:24 PM, Paul ppea...@gmail.com wrote:

And, of course, I was still bounded client side

Was able to ramp up to 1000 queries/sec with no warm up. I believe
that my indexes are pegged in system cache.

Really awesome.

On Aug 24, 1:06 pm, Paul ppea...@gmail.com wrote:

Just wanted to post how ecstatic I am with search performance.

My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.

I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.

The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.

If I start a flood of documents indexing, things slow down a little,
but not much.

ES consumes ~10GB of JVM heap in this setup.

I'm assuming that the system is using the rest of the RAM for disk
cache.

Thanks for the great work!

Paul

ppearcy · August 25, 2010, 7:52pm

OK, major DOH! on my part. Current numbers are ~500 queries per sec,
averaging ~110ms per w/ a fraction of a percent running in over 2
seconds.

My initial numbers of ~500 queries per sec where run from python
scripts on windows boxes. I moved these scripts over to more powerful
linux boxes to try to eliminate any client side bottleneck.
Unfortunately, the time.clock() behaves dramatically differently on
windows vs linux and the larger numbers I posted where a fantasy.

Still really happy with current results, though

Now to distribute.

On Aug 25, 11:43 am, Paul ppea...@gmail.com wrote:

Hi Stephen,
This is a single 24 core, 48GB of RAM machine. And I have further
revised numbers of over 1500 queries per second, with only a handful
(less than 0.1%) running at over 2 seconds and an average running at
34ms.

To add a comparison point, our current legacy search system (which I
would name if I could) can pump through ~80 of the same queries
against the same data set per second on semi-similar h/w.

The really cool thing is that there just does not seem to be a
bottleneck in my config other than CPU. I was really expecting to have
to tweak things to work around memory or disk bottlnecks, but that has
not been the case.

Thanks,
Paul

On Aug 25, 9:06 am, Stephen Day stevv...@gmail.com wrote:

Is this a single machine (24 cores, 48 GB RAM) or are you clustered
across several machines (3 machines 16GB RAM, 8 cores each)?

On Tue, Aug 24, 2010 at 11:24 PM, Paul ppea...@gmail.com wrote:

And, of course, I was still bounded client side

Was able to ramp up to 1000 queries/sec with no warm up. I believe
that my indexes are pegged in system cache.

Really awesome.

On Aug 24, 1:06 pm, Paul ppea...@gmail.com wrote:

Just wanted to post how ecstatic I am with search performance.

My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.

I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.

The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.

If I start a flood of documents indexing, things slow down a little,
but not much.

ES consumes ~10GB of JVM heap in this setup.

I'm assuming that the system is using the rest of the RAM for disk
cache.

Thanks for the great work!

Paul

Topic		Replies	Views
ElasticSearch - Memory and Query Performance Elasticsearch	4	1659	July 6, 2017
Elasticsearch performance tuning on elastic 1.7 Elasticsearch	3	927	July 5, 2017
Performance Issues Elasticsearch	3	447	July 6, 2017
Performance problems Elasticsearch	12	586	July 6, 2017
Slow search response time (low CPU utilization) Elasticsearch	7	3398	July 31, 2019

Search performance

Related topics