How to Increase Indexing rate

mvenkat_in · December 15, 2015, 11:50am

Stack: Logstash 2.1 --> Elastic Search 2.1 --> Kibana 4.3.0
Logstash Input: File

Server Spec: RHEL, 8 Core, 16 GB RAM

Indexing Rate:
1 Cluster - 1 Node --> Approx 2500 to 3000 /Sec
1 Cluster - 2 Nodes --> Approx 5000 to 5500 /Sec
1 Cluster - 3 Nodes --> Approx 5800 to 6000 /Sec
Notes:
All the nodes are in the same server.
The documents are of medium size with max 8-10 fields.
Logstash worker threads tried with 2 and 4 as well - The filter expressions are simple ones.
Set the Index refresh_interval to -1.
The CPU utilization is going upto 85%.

To me Index rate of 5000/sec is pretty slow and the solution wouldn't work.
Can you please suggest the standard configuration to achieve say 50K indexing per sec.
I know this is a generic & open ended statement. However would like to know the standard parameters to consider for sizing the stack.

Thanks

theuntergeek · December 15, 2015, 1:41pm

@mvenkat_in, the bottleneck at 3 nodes is not likely to be Elasticsearch, but Logstash. As you can see, the speed increase is no longer linear after 2 nodes. Part of that is because replicas and primary shards are more distributed, so there is more disk I/O available. The part where the ingest doesn't scale is that you haven't shown your Logstash configuration. Heavy filter usage can limit your output. A small Elasticsearch cluster can usually index much more quickly than a single Logstash instance with a moderate amount of event filtering can feed it.

How many Logstash instances do you have? Are you using a broker, like Redis/Kafka/RabbitMQ? What is your input source? What filters and/or other outputs do you have?

Without the answer to these questions, we can only surmise for what reasons your indexing speed is less than you'd like.

tinle · December 15, 2015, 6:09pm

You are maxing out your HW I/O. Is this server using spinning disk? or SSD?

Tin

mvenkat_in · December 16, 2015, 3:01am

Hi Aaron

I have only one Logstash Instance running to read this specific input..
I don't use any broker.
The source of input is a file (or list of files from a directory) - each with of 1.5 GB size.

How Can I run multiple instances of Logstash reading from the same input file or how can I increase throughput of Logstash !!

Or In order to isolate the problem if in Logstash or Elastic search, is there any way to load test (high Load indexing) into Elastic Search and see if getting better indexing rate !!

Below is the logstash input configuration file.

input {
file {
path => "/syshome/app/elk_agent/notif/data/ns*.log"
start_position => "beginning"
sincedb_path => "notif"
type => "notif"
}
}

filter {

if [type] == "notif"
{
if [message] =~ "ElapsedTime"
{
grok {
match => {"message" =>"%{DATESTAMP:trans_dtm} %{NOTSPACE} %{LOGLEVEL} %{SPACE} %{NOTSPACE} %{JAVAFILE} %{SPACE} %{NOTSPACE} %{NOTSPACE} %{WORD}:%{WORD:action}-%{WORD}:%{WORD:trans_id}-%{WORD}:%{WORD:msisdn}-%{WORD}:%{WORD:service}-%{WORD}:%{GREEDYDATA:attr_1}-%{WORD}:%{WORD:status}-%{WORD}:%{WORD:response_code}-%{WORD}:%{WORD:attr_2}-%{WORD}:%{WORD:attr_3}-%{WORD}:%{WORD:response_time}"}
}
mutate {
remove_field => [ "message","path","type","host"]
gsub => ["trans_dtm", ",\d{3}$", ""]
convert => {
"response_code" => "integer"
"response_time" => "integer"
}
add_field => ["response_desc", "%{status}"]
}

grok {
 match => ["trans_dtm", "^%{MONTHDAY:day}-%{MONTHNUM:month}-%{YEAR:year}"]
}

mutate {
  add_field => {"[@metadata][index]" => "%{year}-%{month}-%{day}"}
  remove_field => ["year", "month", "day"]
}

mutate {add_field => {"[@metadata][type]" => "notif"}}

	date {
	 "match" => [ "trans_dtm", "dd-MM-YYYY HH:mm:ss" ]
   	  target => "trans_dtm"
	  }

}
else
{
drop{ }
}
}

}

output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => "10.0.148.65:9200"
index => "e2e-%{[@metadata][index]}"
document_type => "%{[@metadata][type]}"
}
}
else
{
file {path => "/elkapp/elk/app/logstash-2.1.0/logs/parseerr.log"}
}
}

mvenkat_in · December 16, 2015, 3:01am

HI Tinle
The disks are on the SAN storage.
Below is the SAR Output.

tinle · December 16, 2015, 3:37am

Ah yeah, SAN is not recommended. At least you won't get high speed indexing there.

The other issue is you only have one logstash feeding your cluster. You'll need to use some form of buffer in the middle so you can spin up multiple logstash as consumers. The buffer can be message queue such as Redis or Kafka (which is my recommendation).

I've gotten much higher indexing rate than you got on a 1 node cluster (20x), but I had 25 logstash feeding it. This was in a test environment running ES 1.7.3, very fast SSD, and very specific to my environment. YMMV.

Tin

mvenkat_in · December 16, 2015, 4:08am

Hi Tin
hmm...OK It is a Fiber Channel SAN Storage. So thought it would be faster.
But how can I deep dive and Isolate the are of problem i.e., slowness at the sending end (Logstash) or receiving end (Elastic search)..

Is there any way to monitor the Logstash throughput or Queue sizes !!
How Can I run multiple logstash instances in parallel say reading from the same input!!

Thanks

tinle · December 16, 2015, 8:18pm

You need to use a queue as a buffer. Something like this:

logfiles -> logstash -> message queue -> logstash (multiple) -> elasticsearch

Monitoring logstash metric is rather lacking at the moment. The only thing available that I know of at the moment is logstash metrics filter.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html

Tin

msimos · December 16, 2015, 9:00pm

Can you try using separate inputs for each file instead of a wildcard. As its a single thread per file input. See if this makes any difference.

mvenkat_in · December 17, 2015, 2:30am

Hi Mike
I understand your input. However the results I had posted earlier are only for single file.

Topic		Replies	Views
Faster speed when indexing log files Logstash	11	5160	July 6, 2017
Elasticsearch indexing too slow Elasticsearch	2	1631	February 25, 2017
Increasing elasticsearch indexing rate Elasticsearch	14	12785	March 9, 2017
Improving log ingestion speed and faster elasticsearch indexing Elasticsearch	6	877	November 1, 2021
Improving Elasticsearach ingest capacity Elasticsearch	7	111	June 20, 2024

How to Increase Indexing rate

Related topics