Indexing Rate:
1 Cluster - 1 Node --> Approx 2500 to 3000 /Sec
1 Cluster - 2 Nodes --> Approx 5000 to 5500 /Sec
1 Cluster - 3 Nodes --> Approx 5800 to 6000 /Sec
Notes:
All the nodes are in the same server.
The documents are of medium size with max 8-10 fields.
Logstash worker threads tried with 2 and 4 as well - The filter expressions are simple ones.
Set the Index refresh_interval to -1.
The CPU utilization is going upto 85%.
To me Index rate of 5000/sec is pretty slow and the solution wouldn't work.
Can you please suggest the standard configuration to achieve say 50K indexing per sec.
I know this is a generic & open ended statement. However would like to know the standard parameters to consider for sizing the stack.
@mvenkat_in, the bottleneck at 3 nodes is not likely to be Elasticsearch, but Logstash. As you can see, the speed increase is no longer linear after 2 nodes. Part of that is because replicas and primary shards are more distributed, so there is more disk I/O available. The part where the ingest doesn't scale is that you haven't shown your Logstash configuration. Heavy filter usage can limit your output. A small Elasticsearch cluster can usually index much more quickly than a single Logstash instance with a moderate amount of event filtering can feed it.
How many Logstash instances do you have? Are you using a broker, like Redis/Kafka/RabbitMQ? What is your input source? What filters and/or other outputs do you have?
Without the answer to these questions, we can only surmise for what reasons your indexing speed is less than you'd like.
I have only one Logstash Instance running to read this specific input..
I don't use any broker.
The source of input is a file (or list of files from a directory) - each with of 1.5 GB size.
How Can I run multiple instances of Logstash reading from the same input file or how can I increase throughput of Logstash !!
Or In order to isolate the problem if in Logstash or Elastic search, is there any way to load test (high Load indexing) into Elastic Search and see if getting better indexing rate !!
Ah yeah, SAN is not recommended. At least you won't get high speed indexing there.
The other issue is you only have one logstash feeding your cluster. You'll need to use some form of buffer in the middle so you can spin up multiple logstash as consumers. The buffer can be message queue such as Redis or Kafka (which is my recommendation).
I've gotten much higher indexing rate than you got on a 1 node cluster (20x), but I had 25 logstash feeding it. This was in a test environment running ES 1.7.3, very fast SSD, and very specific to my environment. YMMV.
Hi Tin
hmm...OK It is a Fiber Channel SAN Storage. So thought it would be faster.
But how can I deep dive and Isolate the are of problem i.e., slowness at the sending end (Logstash) or receiving end (Elastic search)..
Is there any way to monitor the Logstash throughput or Queue sizes !!
How Can I run multiple logstash instances in parallel say reading from the same input!!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.