Just wanted to post how ecstatic I am with search performance.
My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.
I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.
The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.
If I start a flood of documents indexing, things slow down a little,
but not much.
ES consumes ~10GB of JVM heap in this setup.
I'm assuming that the system is using the rest of the RAM for disk
cache.
Just wanted to post how ecstatic I am with search performance.
My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.
I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.
The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.
If I start a flood of documents indexing, things slow down a little,
but not much.
ES consumes ~10GB of JVM heap in this setup.
I'm assuming that the system is using the rest of the RAM for disk
cache.
Just wanted to post how ecstatic I am with search performance.
My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.
I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.
The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.
If I start a flood of documents indexing, things slow down a little,
but not much.
ES consumes ~10GB of JVM heap in this setup.
I'm assuming that the system is using the rest of the RAM for disk
cache.
Hi Stephen,
This is a single 24 core, 48GB of RAM machine. And I have further
revised numbers of over 1500 queries per second, with only a handful
(less than 0.1%) running at over 2 seconds and an average running at
34ms.
To add a comparison point, our current legacy search system (which I
would name if I could) can pump through ~80 of the same queries
against the same data set per second on semi-similar h/w.
The really cool thing is that there just does not seem to be a
bottleneck in my config other than CPU. I was really expecting to have
to tweak things to work around memory or disk bottlnecks, but that has
not been the case.
Just wanted to post how ecstatic I am with search performance.
My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.
I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.
The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.
If I start a flood of documents indexing, things slow down a little,
but not much.
ES consumes ~10GB of JVM heap in this setup.
I'm assuming that the system is using the rest of the RAM for disk
cache.
OK, major DOH! on my part. Current numbers are ~500 queries per sec,
averaging ~110ms per w/ a fraction of a percent running in over 2
seconds.
My initial numbers of ~500 queries per sec where run from python
scripts on windows boxes. I moved these scripts over to more powerful
linux boxes to try to eliminate any client side bottleneck.
Unfortunately, the time.clock() behaves dramatically differently on
windows vs linux and the larger numbers I posted where a fantasy.
Hi Stephen,
This is a single 24 core, 48GB of RAM machine. And I have further
revised numbers of over 1500 queries per second, with only a handful
(less than 0.1%) running at over 2 seconds and an average running at
34ms.
To add a comparison point, our current legacy search system (which I
would name if I could) can pump through ~80 of the same queries
against the same data set per second on semi-similar h/w.
The really cool thing is that there just does not seem to be a
bottleneck in my config other than CPU. I was really expecting to have
to tweak things to work around memory or disk bottlnecks, but that has
not been the case.
Just wanted to post how ecstatic I am with search performance.
My h/w is 24 CPU cores w/ 48GB or ram. I have 45 indexes total with
the largest getting a couple of shards, for a total of 75 shards. I
have 20 million documents with an index size of ~28GB.
I am able to pump through ~450 real world queries per second. The
queries are very large boolean queries, some with wildcards and prefix
queries and represent 10+ years of legacy queries that are far from
optimal in most cases. These are not the standard simple test queries.
Also, these are queries that have made it past our caching layer,
implying that there is a lot of variation in the requests.
The obvious bottleneck ends up being CPU saturation. Disk load looks
negligible.
If I start a flood of documents indexing, things slow down a little,
but not much.
ES consumes ~10GB of JVM heap in this setup.
I'm assuming that the system is using the rest of the RAM for disk
cache.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.