Filebeat to Elasticsearch log shipping is very slow


(Raji Sankaran) #1

Hi,

I have found some posts on this before. But, have not got a definitive answer with respect to Beats 6.x as spool_size has been removed.

The following tests have been performed on
Filebeat 6.x
Elasticsearch 6.2.4 with 16GB Heap

Filebeat config with File Output gives 80,000 events/s

output.file:
path: "/opt/CCURfilebeat"
filename: filebeat
number_of_files: 7
permissions: 0600

queue:
mem:
events: 40000
flush.min_events: 20000

Filebeat config with elasticsearch output gives 14,000 events/s
output.elasticsearch:
hosts: ["elastic-server:9200"]
bulk_max_size: 20000
username: "elastic"
password: "elasticpassword"

queue:
mem:
events: 40000
flush.min_events: 20000

The elasticsearch indexing rate is 13,800 events/s and this seems to the bottleneck.

What i dont understand is Elasticsearch CPU Utilization is 10% and JVM Heap Used is 6GB/16GB. Then why is the indexing rate still so low? What other factors should we consider to stress the elasticsearch system?

Any suggestions on improving this performance would be highly appreciated.


(Christian Dahlqvist) #2

Have you optimised Elastichsearch for indexing speed? What type of storage do you have? What is disk I/O and iowait looking like? How many nodes in the cluster?


(Raji Sankaran) #3

Hi,

After a brief benchmark, it appears that the problem was with Elasticsearch indexing speed. Thanks, Christian!

For Benchmarking purposes, we have a single node Elasticsearch. This has a single index with one shard and no replicas. We use a 400GB SSD , 32GB RAM in which 16GB is allocated for Heap, 12 Core Processor.
we are using a single thread and a 2GB flush threshold and 30s refresh interval.
Index settings are as follows.

"index.merge.scheduler.max_thread_count" : "1",
"index.translog.flush_threshold_size" : "2gb",
"index.refresh_interval": "30s",
"index.mapping.total_fields.limit":"30000"

i am unable to index more than 60,000 document/s from Filebeat.

Using X-Pack monitoring, the CPU Utilisation is 60%, JVM Heap Utilization is 72% and disk I/O is 130 MB/s. Clearly none of these factors is the bottleneck.

I am not sure what else might be the bottleneck. Is there a way to find what factors might attribute to this?
Thanks in advance.


(Christian Dahlqvist) #4

If you are using dynamic mappings (I am guessing this may be the case based on the number of fields you have specified) and are adding fields as indexing progresses, each change will require the cluster state to get updated, which can slow indexing down.


(Raji Sankaran) #5

Sorry, that parameter was not needed. We use static mappings. Any other factors that might affect this performance?

I am wondering why its set at 60,000 documents/s when Elasticsearch can do much more. Not knowing what the bottleneck is bothering me. I am sure i am missing something here.


(Raji Sankaran) #6

From Elasticsearch Benchmarking for HTTP Logs at https://elasticsearch-benchmarks.elastic.co/index.html#tracks/http-logs/nightly/30d , the number of documents indexed seems to be 171,000 docs/s for 3-node Elasticsearch.

With one Node elasticsearch, i am able to, 60,000 docs/s. This seems to be fine though. But, is this comparison valid?

Apart from the usual resources, like CPU, Memory, Disk I/O, Network , what other factors could limit elasticsearch performance?


(Christian Dahlqvist) #7

Documents per second is not really a very good measurement of indexing performance as it will depend a lot on the size and complexity of the documents being indexed. You will get a better comparison if you run the same Rally track on your hardware.


(Raji Sankaran) #8

Yeah, that makes sense. But, this benchmark is for HTTP logs and i am importing raw logs from a HTTP server as well. Hence, was hoping it would be close enough.

I am still struggling with finding what else could be the bottleneck. Any pointers to that is highly appreciated.


(Christian Dahlqvist) #9

The standard HTTP logs track uses very small documents, so it may or may not be comparable. I created a track that simulates events that are a bit larger and probably is closer to what you would get out of Filebeat. We talked about it here and it is available on GitHub.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.