Performance Issues during data-ingestion

Hi everyone,

i am currently testing the elastic stack for observerability use-cases in my company. For testing purposes we build a small elasticsearch cluster (3 nodes) and ingesting http-logs with filebeat.

Our PoC-setup looks like the following:

3 ES-Nodes: 8 Cores, 8 GB RAM (4GB ES Heap), 100GB HDD
Filebeat: 4 Cores, 4 GB RAM, 50GB HDD

During testing we noticed very slow performance ingesting http-logs from filebeat to elasticsearch. We used about 5 GB of old apache logs and ingesting them in a three primary shard index (no replication) took about 6 hours on average. We tested this multiple times.

Using only a single elasticsearch instance and a single primary shard, ingesting this data took even longer with an average duration of 8-9 hours.

For comparison: Using Splunk and their universalforwarder indexing only takes about 15min (on the same system).

Every configuration is kept to the most default. We aren't using an x-pack extension and only configured minimum elasticsearch and filebeat settings. (We do not want to tweak performance to a maximum, but rather get the "default" performance of elastic)

Also we already checked the performance-optimization guide from elastic.co

Here are also some elasticsearch-system stats, which were collected during ingesting.

CPU Usage: About 50% (htop shows load average of ~4 of 8 cores)
RAM Usage: About 5-6/8 GB per Node (4 GB is fully assigned as ES Heap with mem_lock: true)
IOStats:

|Node|%user|%iowait|kB_read/s|kB_write/s|~ uptime|
|------|-------|-------|------|-------|-----------|
|ES-1|4.20|13.00|108|1423|160min|
|ES-2|4.15|12.37|113|1425|160min|
|ES-3|4.07|12.63|96 |1389|160min|
|FB  |1.31|1.27 |559|19  |      |

Is this kind of performance excepted? If not, has anyone an idea where the problem might be? I am free to provide any additionally needed infos and can also offer to take a live view on our poc-systems.

Best regards,
Mo

(Reddit-Link)

That is indeed very bad throughput and not normal. Given the reasonably high iowait reported together with the reasonably low throughput I would start investigating whether the storage you are using is limiting throughput. Indexing in Elasticsearch is I/O intensive so having slow storage will limit throughput. I would also recommend looking at this section in the documentation.

Hi, thanks for the answer.

How could i investigate into the idea that the HDDs are limiting throughput and thus the data ingestion takes so long?

Why is Splunk not affected by this? (Again the same VMs are used for both Tools)

Try to check the /etc/systemd/system/multi-users.target.wants/filebeat.service
add this in to [Service] part

[Service]
LimitNOFILE=100000

In my experience, it's help increase the performance.

Also check this blog

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.