Hello!
I'm having some difficulties to maximize performances and I'd like to receive some advices.
I'm using Filebeat (v. 6.2.2) to send output directly to Elasticsearch (v. 6.2.2), on a Linux Ubuntu machine with 16 cores and 32GB RAM. All the configuration files are written below.
The goal is to store around 42000 logs, from 3 different servers:
- First Server (the heaviest): 12200 logs, each log contains between 100k and 300k lines, 750GB
- Second Server: 14900 logs, each log contains between 30k and 150k lines, 250G
- Third Server: 14900 logs, each log contains between 30k and 150k lines, 250G
At the moment, the throughput starts at 15000 events per second, but falls to 5000 eps: as an example, I tried to send 480 logs (around 19 millions events) and it took 45 minutes.
I tried to resume my doubts in the following questions:
- Does the number of harvester started have effects on the performance? Is it better to harvest batches of 500 logs instead of 4000 logs? The main difference would be the number of lines to process at the same time (19millions of events against 500millions)
- What is the rule of thumb to set the bulk_max_size option in the filebeat.yml? After a couple of tries it seems that a smaller number (such as 5000) is better than a big one, even if I thought that a larger number would mean more lines proceeded at the same time (and with logs of 50K lines it looked like a good idea, not splitting each log in too many batches).
- As you can see, I have set up 1000 workers: I realize it's an absurd number but with a number too low it was very slow. Since I'm working on a single node on a single cluster, what is the ideal number of workers?
- Referencing the third question, should I use more than one node? I'm planning to use only one index with 200 shards, and using more than one index is not an eligible option.
- Will modify the queue.mem parameter provide any improvement?
- In your opinion, what is the best way to calculate throughput? I was mainly using the Filbeat logs and direct experience (try to load a certain number of events, see how much it took)
Thank you in advance, any help will be very appreciated!
elasticsearch.yml
bootstrap.memory_lock: true
indices.memory.index_buffer_size: 50%
indices.memory.min_index_buffer_size: 192mb
jvm.options
-Xms15g
-Xmx15g
filebeat.yml
filebeat.prospectors:
- type: log
enabled: true
paths:
- /path/to/logs/*.json
json.keys_under_root: true
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
hosts: ["localhost:9200"]
index: "my_index"
loadbalance: true
worker: 1000
bulk_max_size: 5000
compression_level: 0
Settings for my_index
"settings": {
"number_of_shards": 20,
"number_of_replicas": 0,
"codec": "best_compression",
"refresh_interval": "30s",
"translog.sync_interval": "1m",
"translog.flush_threshold_size": "1gb",
"translog.durability" : "async",
"merge.scheduler.max_thread_count": "1"}
For testing, I'm creating an index with only 20 shards but as I said before I'll need many more, would this be a problem for indexing speed? The max_thread_count is set to 1 since my index is on spinning platter drives.