Hi
I am using elasticsearch and logstash for managing logs from a very big
system. Right now, during the test phase, I am using 1 machine. It indexes
about 2000 logs messages per second. If I have 10 million log messages, it
takes a few hours for the indexing to complete. (I dont know if its the
indexing thats taking a long time or logstash for filtering the messages).
If I a create a cluster of ES machines, will this speed up indexing? I am
not really concerned about replicas. Can I configure ES nodes to do just
the indexing part and not worry about replicas. Please suggest any
techniques
Yes, you can change the number of replicas on the fly using the Update
Settings API:
So you can set the number of replicas to 0 and have only your primary
shards balanced across your cluster. If this case, adding more nodes will
help your indexing speed, as long as you have enough shards to spread on
all your nodes.
As Logstash uses daily indices by default, you'll probably want to make
sure that shards of today's index (which are hit with indexing requests)
are evenly distributed. A simple way of doing this is with the
index.routing.allocation.total_shards_per_node setting:
For example, if you have 10 shards per index (no replicas) and 5 nodes, set
that number to 2.
There are quite a lot of tricks to get your indexing speed up. Although,
there's almost always a trade-off. Here are the top 3 (IMO):
You may want to monitor your cluster to check out what work and what
doesn't, and where are your bottlenecks. If you're looking for a monitoring
tool for Elasticsearch, check out our SPM:
On Sat, Jul 27, 2013 at 3:49 AM, Kiran Madabhushi maskiran@gmail.comwrote:
Hi
I am using elasticsearch and logstash for managing logs from a very big
system. Right now, during the test phase, I am using 1 machine. It indexes
about 2000 logs messages per second. If I have 10 million log messages, it
takes a few hours for the indexing to complete. (I dont know if its the
indexing thats taking a long time or logstash for filtering the messages).
If I a create a cluster of ES machines, will this speed up indexing? I am
not really concerned about replicas. Can I configure ES nodes to do just
the indexing part and not worry about replicas. Please suggest any
techniques
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.