I am running elasticstack on a single machine, so I am unable to create replicas and I don't want to run multiple nodes on the same machine. So does the increase in number of shards increase the query performance of elasticsearch for my log data?
Not unless they are quite large. It can actually have the opposite effect and decrease query performance. What is currently your average shard size?
Each index contains data of one day and after indexing the data, each day's data takes about 1.2 - 1.5 GB. And each index contains 5 primary shards (default configuration). Thanks
So can you please suggest another alternative to increase the query performance? Also, do I need to decrease the number of shards per index then?
With that amount of data you could have just a single shard per day, and I would not be surprised if it performs better. You can also use the shrink index API to shrink existing indices down to 1 shard. 1GB -2GB is not large for a shard.
As of now, some queries are taking 12 - 15 seconds to get some results and visualise them in kibana (this is when we query on 15 days of data and the plan moving forward is to use the Elasticstack for 6 months of data. And as the data increases will performance (as in query time) go down? So, now the only available option is to increase the RAM of my machine to improve performance?
Look at what is limiting performance. If you are seeing a lot of GC and experiencing heap pressure, more RAM and a larger heap may help. If you could however just as well be limited by CPU or disk I/O, more RAM might not help much. Monitor CPU, disk I/O and iowait while querying and see if any of these could be the limiting factor.
Currently we are running a server of 16 GB RAM for elasticstack for our log data. We want to change the heap size of elasticsearch to 8 GB and we are not running anything other than elasticstack on the server. So now coming to the question can we allocate another 4GB as heap size for logstash to insert data into elastic? Every morning a log file gets dumped into the server of previous day's log data. Since logstash is running continuously, that's the only time it will be using the full heap size (about 20 minutes), and other than that it is mostly idle. So is it wise to allocate 4GB of heap size for logstash?
The recommendation to allocate 50% of RAM to the Elasticsearch heap generally assumes you are just running Elasticsearch on the host, not the rest of the stack as well. In order to work well, Elasticsearch need access to a good amount of file system cache, so assigning 8GB to Elasticsearch heap and 4GB to Logstash out of the 16GB available sounds excessive. If you however only index during certain hours it might work, but the only way to know for sure is to test.
We only index once a day and even that takes only minutes.
Do you shut down Logstash once it has finished processing or will it continue to use up the configured heap space even when not active?
We don't shut it down because whenever a log file gets dumped into the server logstash must automatically procees the data. But now I think we should change our model.
If you do not shut it down you will need to adjust the heap usage accordingly.
@Christian_Dahlqvist Thanks for your insights.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.