I am evaluating ElasticSearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to ElasticSearch, I am
trying to figure out a couple of things and have multiple questions.
For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.
Questions:
I inserted 1 million records in an index with 5 shards. Somehow, I
didn't see any spike in memory usage. I am not sure if that is something I
should expect.
Apart from JVM, what else does ES majorly need memory for? For
example, I know that increasing the number of shards per index comes with
the cost of more memory and I tried this myself. I am asking this question
because I want to understand under what circumstance, memory will be a
bottleneck.
Does ES keep indices partially or fully in the memory for any
purposes like speeding up search queries unless the index is in-memory?
Even though READS were extremely fast, SEARCHES were very poor. One
query was taking more than 500 ms. This was troubling me. I thought ES can
easily handle 1 million records and search through them. Is this normal?
Can I improve this on just one node or adding more nodes with replicas will
help?
What should I use for benchmarking? I am not sure if YCSB the right
tool to benchmark ES and does not offer much flexibility.
How should I plan resources in terms of memory and CPU.
In ES features like sorting and faceting require the most memory.
These features use the field data cache which sits inside the jvm heap
space. These features aren't used during indexing. ES doesn't load the
index into the jvm heap space. ES does use the OS filesystem cache
during searching, so in many cases large parts of the index are in
memory (this is the reason why you shouldn't allocate all your
available memory to the ES jvm process).
Can you share your search request, or elaborate what type of queries,
facets and features you're using in your search requests? In general
adding more nodes (and optionally increasing number of replicas) can
improve query performance. Unfortunately I'm not familiar with YCSB.
The amount of memory you need depend on the type of search requests
and index size (#docs and #terms etc.), so I it is difficult to tell.
Usually a single node has multiple indices, each index has multiple
shards and each shard has one or more copies. A search request is
usually under the hood translated into several shard search requests.
So a node is highly concurrent and having a decent number of cpu's /
cpu cores per node make sense. I wouldn't invest to much in super
powerful nodes (machines), just add more nodes when you need to
increase search request throughput or improve search performance.
I am evaluating Elasticsearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to Elasticsearch, I am trying
to figure out a couple of things and have multiple questions.
For comparing ES and a few other data stores, I used YCSB to run some
benchmarks of different kinds of scenarios we foresee right now. Our
application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.
Questions:
I inserted 1 million records in an index with 5 shards. Somehow, I didn't
see any spike in memory usage. I am not sure if that is something I should
expect.
Apart from JVM, what else does ES majorly need memory for? For example, I
know that increasing the number of shards per index comes with the cost of
more memory and I tried this myself. I am asking this question because I
want to understand under what circumstance, memory will be a bottleneck.
Does ES keep indices partially or fully in the memory for any purposes like
speeding up search queries unless the index is in-memory?
Even though READS were extremely fast, SEARCHES were very poor. One query
was taking more than 500 ms. This was troubling me. I thought ES can easily
handle 1 million records and search through them. Is this normal? Can I
improve this on just one node or adding more nodes with replicas will help?
What should I use for benchmarking? I am not sure if YCSB the right tool to
benchmark ES and does not offer much flexibility.
How should I plan resources in terms of memory and CPU.
On Tuesday, January 1, 2013 8:27:49 AM UTC-5, Vaidik Kapoor wrote:
Hi,
I am evaluating Elasticsearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to Elasticsearch, I am
trying to figure out a couple of things and have multiple questions.
For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.
Questions:
I inserted 1 million records in an index with 5 shards. Somehow, I
didn't see any spike in memory usage. I am not sure if that is something I
should expect.
Indexing is not memory heavy. Search can be, if it involves sorting and
faceting.
Apart from JVM, what else does ES majorly need memory for? For
example, I know that increasing the number of shards per index comes with
the cost of more memory and I tried this myself. I am asking this question
because I want to understand under what circumstance, memory will be a
bottleneck.
Each open index has some memory cost. OS wants to cache the index as much
as possible (that's outside the JVM and ES).
Does ES keep indices partially or fully in the memory for any
purposes like speeding up search queries unless the index is in-memory?
Not explicitly. This is somewhat in your control - if you leave more RAM
to the OS it will be able to cache more of the index.
Even though READS were extremely fast, SEARCHES were very poor. One
query was taking more than 500 ms. This was troubling me. I thought ES can
easily handle 1 million records and search through them. Is this normal?
Can I improve this on just one node or adding more nodes with replicas will
help?
If the query was expensive, if it matched lots of docs, if it hit the part
of the index that was not cached... it will take longer than if
Run the same query a few times and you should see latency go down due to
caching.
What should I use for benchmarking? I am not sure if YCSB the right
tool to benchmark ES and does not offer much flexibility.
Apache JMeter will do.
How should I plan resources in terms of memory and CPU.
That's a very open question that nobody can precisely answer unfortunately.
Thanks for your reply. It was insightful. I have spent some time
watching videos listed on Elasticsearch's website. Particularly the one by
Shay and one by a probably a colleague of yours at Berlin Buzzwords were
very helpful.
I have been able to optimize on query times with whatever basic knowledge
of ES I have. Having successfully done that and knowing that organizations
use ES successfully, I am confident that with time I will be able to
improve query performance.
Thanks for all the help!
On Thursday, January 3, 2013 10:18:20 AM UTC+5:30, Otis Gospodnetic wrote:
Hello,
On Tuesday, January 1, 2013 8:27:49 AM UTC-5, Vaidik Kapoor wrote:
Hi,
I am evaluating Elasticsearch (ES) for a project and it comes close to
the requirements we are looking at. Since I am new to Elasticsearch, I am
trying to figure out a couple of things and have multiple questions.
For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.
Questions:
I inserted 1 million records in an index with 5 shards. Somehow, I
didn't see any spike in memory usage. I am not sure if that is something I
should expect.
Indexing is not memory heavy. Search can be, if it involves sorting and
faceting.
Apart from JVM, what else does ES majorly need memory for? For
example, I know that increasing the number of shards per index comes with
the cost of more memory and I tried this myself. I am asking this question
because I want to understand under what circumstance, memory will be a
bottleneck.
Each open index has some memory cost. OS wants to cache the index as much
as possible (that's outside the JVM and ES).
Does ES keep indices partially or fully in the memory for any
purposes like speeding up search queries unless the index is in-memory?
Not explicitly. This is somewhat in your control - if you leave more RAM
to the OS it will be able to cache more of the index.
Even though READS were extremely fast, SEARCHES were very poor.
One query was taking more than 500 ms. This was troubling me. I thought ES
can easily handle 1 million records and search through them. Is this
normal? Can I improve this on just one node or adding more nodes with
replicas will help?
If the query was expensive, if it matched lots of docs, if it hit the part
of the index that was not cached... it will take longer than if
Run the same query a few times and you should see latency go down due to
caching.
What should I use for benchmarking? I am not sure if YCSB the
right tool to benchmark ES and does not offer much flexibility.
Apache JMeter will do.
How should I plan resources in terms of memory and CPU.
That's a very open question that nobody can precisely answer
unfortunately.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.