I have set up an elasticsearch cluster with 1 shard and 0 replicas.
My system has 16 GB RAM and I have allocated 8 GB to the ES Max/Min Heap.
We are indexing a large number of logs everyday and the size of our daily
index is approximately 3,500,000 documents.
We are using Kibana to query ES and generate reports. Most of our reports
are histograms and hence require heavy facetting. My observation was this -
the dashboards took very long to load (depending on the time limit
selected) and the field cache size being unlimited (by default) started
rising and eventually resulted in an Out of Memory Error.
I have now restricted the field cache size to 50% of the available heap
memory. Although this does result in reducing this error, there is not much
difference in the performance and search takes long.
Another observation is that with 1 shard and 0 replicas, my ES node is not
making use of the other CPU cores. I have 4 cores on my system and the CPU%
shown by the top command for the elasticsearch process just barely exceeds
100%. I believe this indicates that it uses one core in entirety but not
all 4 cores.
Will increasing the number of shards make better use of the multi-core
architecture and enable parallel search? Also, if so, what is the best way
to get this working? Should I make changes in the ES configuration file and
then re-start the cluster? How does this affect the currently existing
indices? We create 2 indices per day. Now when I increase the number of
shards for the cluster, how will the data in the previously created indices
get distributed among the newly created shards?
Given you're only on one server you are limited with what you can do.
You'd be better off adding another node if you can, maybe someone else can
comment on the rest.
I have set up an elasticsearch cluster with 1 shard and 0 replicas.
My system has 16 GB RAM and I have allocated 8 GB to the ES Max/Min Heap.
We are indexing a large number of logs everyday and the size of our daily
index is approximately 3,500,000 documents.
We are using Kibana to query ES and generate reports. Most of our reports
are histograms and hence require heavy facetting. My observation was this -
the dashboards took very long to load (depending on the time limit
selected) and the field cache size being unlimited (by default) started
rising and eventually resulted in an Out of Memory Error.
I have now restricted the field cache size to 50% of the available heap
memory. Although this does result in reducing this error, there is not much
difference in the performance and search takes long.
Another observation is that with 1 shard and 0 replicas, my ES node is not
making use of the other CPU cores. I have 4 cores on my system and the CPU%
shown by the top command for the elasticsearch process just barely exceeds
100%. I believe this indicates that it uses one core in entirety but not
all 4 cores.
Will increasing the number of shards make better use of the multi-core
architecture and enable parallel search? Also, if so, what is the best way
to get this working? Should I make changes in the ES configuration file and
then re-start the cluster? How does this affect the currently existing
indices? We create 2 indices per day. Now when I increase the number of
shards for the cluster, how will the data in the previously created indices
get distributed among the newly created shards?
Elasticsearch is using all cores by default. If you do not see 100% of CPU
use, this is no reason to worry. 100% CPU would signal bad programming
style (this would be a bug). You should watch the system load. If system
load is low, you have either not enough query load, or your configuration
prevents Elasticsearch from being fully utilized. Just by executing some
Kibana queries you will never be able to push Elasticseach to execessivley
high system load. Try 100 or 1000 or 10k queries a second, and you will see
increasing load.
There are many reasons for "Out of memory". It depends on the kind of query
and the kind of data you use. In many cases you can save enormous amounts
of memory just be rewriting heavy queries or setting up a lean field
mapping. Additionally you may modify Elasticsearch default settings to save
heap usage. But all of this has limits.
You should set up a cluster that can scale. For example, if you have 2
indices per day, and you like to use your node for a year, you'll end up
with >750 indices, which is by default settings 750*5 > 3500 shards, which
is pretty heavy for a single node. Have you tested 3500 shards on a single
node?
If you are bound to your data structure in the index and the Kibana queries
and can not change them, there is not much you can do except upgrading ES
and adding more nodes, as Mark already noted.
Given you're only on one server you are limited with what you can do.
You'd be better off adding another node if you can, maybe someone else can
comment on the rest.
I have set up an elasticsearch cluster with 1 shard and 0 replicas.
My system has 16 GB RAM and I have allocated 8 GB to the ES Max/Min Heap.
We are indexing a large number of logs everyday and the size of our daily
index is approximately 3,500,000 documents.
We are using Kibana to query ES and generate reports. Most of our reports
are histograms and hence require heavy facetting. My observation was this -
the dashboards took very long to load (depending on the time limit
selected) and the field cache size being unlimited (by default) started
rising and eventually resulted in an Out of Memory Error.
I have now restricted the field cache size to 50% of the available heap
memory. Although this does result in reducing this error, there is not much
difference in the performance and search takes long.
Another observation is that with 1 shard and 0 replicas, my ES node is
not making use of the other CPU cores. I have 4 cores on my system and the
CPU% shown by the top command for the elasticsearch process just barely
exceeds 100%. I believe this indicates that it uses one core in entirety
but not all 4 cores.
Will increasing the number of shards make better use of the multi-core
architecture and enable parallel search? Also, if so, what is the best way
to get this working? Should I make changes in the ES configuration file and
then re-start the cluster? How does this affect the currently existing
indices? We create 2 indices per day. Now when I increase the number of
shards for the cluster, how will the data in the previously created indices
get distributed among the newly created shards?
We plan to store data for only about 3- 6 months and hence, we thought this
configuration might be okay.
A couple of simultaneous Kibana dashboard queries (mainly to generate
histograms) resulted in the system load reaching 10. This happened owing
to a large number of disk I/O operations.
We would appreciate any help in pin pointing exactly which queries reaching
Elasticsearch are taking so long and how we can go about trying to optimize
them.
Thanks.
On Tuesday, May 13, 2014 1:33:19 PM UTC+5:30, Jörg Prante wrote:
Elasticsearch is using all cores by default. If you do not see 100% of CPU
use, this is no reason to worry. 100% CPU would signal bad programming
style (this would be a bug). You should watch the system load. If system
load is low, you have either not enough query load, or your configuration
prevents Elasticsearch from being fully utilized. Just by executing some
Kibana queries you will never be able to push Elasticseach to execessivley
high system load. Try 100 or 1000 or 10k queries a second, and you will see
increasing load.
There are many reasons for "Out of memory". It depends on the kind of
query and the kind of data you use. In many cases you can save enormous
amounts of memory just be rewriting heavy queries or setting up a lean
field mapping. Additionally you may modify Elasticsearch default settings
to save heap usage. But all of this has limits.
You should set up a cluster that can scale. For example, if you have 2
indices per day, and you like to use your node for a year, you'll end up
with >750 indices, which is by default settings 750*5 > 3500 shards, which
is pretty heavy for a single node. Have you tested 3500 shards on a single
node?
If you are bound to your data structure in the index and the Kibana
queries and can not change them, there is not much you can do except
upgrading ES and adding more nodes, as Mark already noted.
Given you're only on one server you are limited with what you can do.
You'd be better off adding another node if you can, maybe someone else
can comment on the rest.
On 13 May 2014 16:34, Rujuta Deshpande <ruj...@gmail.com <javascript:>>wrote:
Hi,
I have set up an elasticsearch cluster with 1 shard and 0 replicas.
My system has 16 GB RAM and I have allocated 8 GB to the ES Max/Min
Heap.
We are indexing a large number of logs everyday and the size of our
daily index is approximately 3,500,000 documents.
We are using Kibana to query ES and generate reports. Most of our
reports are histograms and hence require heavy facetting. My observation
was this - the dashboards took very long to load (depending on the time
limit selected) and the field cache size being unlimited (by default)
started rising and eventually resulted in an Out of Memory Error.
I have now restricted the field cache size to 50% of the available heap
memory. Although this does result in reducing this error, there is not much
difference in the performance and search takes long.
Another observation is that with 1 shard and 0 replicas, my ES node is
not making use of the other CPU cores. I have 4 cores on my system and the
CPU% shown by the top command for the elasticsearch process just barely
exceeds 100%. I believe this indicates that it uses one core in entirety
but not all 4 cores.
Will increasing the number of shards make better use of the multi-core
architecture and enable parallel search? Also, if so, what is the best way
to get this working? Should I make changes in the ES configuration file and
then re-start the cluster? How does this affect the currently existing
indices? We create 2 indices per day. Now when I increase the number of
shards for the cluster, how will the data in the previously created indices
get distributed among the newly created shards?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.