Poor performance and a lot of GC overhead

Hi,

we are running elasticsearch with the following spec:

1 node on a hardware server
16GB of RAM
2GB assigned for heap
8 CPU cores (Intel Xeon CPU E5620 @ 2.40GHz)
HDD disks
Elasticsearch 5.6.3
JVM 1.8.0_171

we have around 20 indices, some daily, some weekly, some monthly.
each index has the default 5 shards and 1 replica.
each index gets might get around 1M documents per day at most.
there is indexing operation that is happening constantly, and every minute or so, there are also search operations that are happening.

the spec described above is what we have to work with :slight_smile: and I know it sucks... but nothing much to do there.
the reason that there are only 2GB allocated for heap is that there are other components running on this machine and they also need resources...

the immediate issue we can pinpoint is that we are seeing a lot of GC overhead alerts in elasticsearch logs such as these:

[gc][999] overhead, spent [842ms] collecting in the last [1.4s]

so my questions are as follows:

  1. what might be the cause for those GC overhead? and are they an issue to worry about (seems to me that it is an issue...)
  2. what can we do to properly measure the actual capacity of this server? I mean to understand what is the performance I can get expect from elasticsearch given those constraints?
  3. what would be a better sharding strategy given the above spec (that we have only 1 server in this setup).

thanks in advance!

It sounds like you may be having a too many shards, especially given the small heap size. Read this blog post for a discussion around shards and sharding practices.

Until you can revise your sharding strategy, I would recommend increasing the heap size.

thanks, so I was actually thinking to reduce the shard amount to 1 primary 0 replicas. since this is only a single node, so I don't see a point in having more than a single shard on a single node. but is that true or am I missing something?

With single node there is no point in having replicas configured as Elasticsearch will not be able to allocate them anyway. How many shards do you have in the cluster? What is the average shard size?

as I mentioned above, I have the default 5 shards per index. but the indices are rather small. none of them go beyond 1 GB in size...

That is wasteful, but you may be having too many indices as well. What is the output of the cluster stats API?

sorry for the weird way to share output... but I just don't have internet access (or copy paste for that matter to the machine at the moment... ), but here it is:

62 shards for 407 MB of data is far too much. That data would easily fit within a single shard.

I know.
this is a test machine. on production there will be more data.
but still it would be less than 1-2GB per index.
so that is why I was thinking to have just 1 shard per index.
(the separation to indices is mandatory, since those are different types of data from different applications).

are there any things to consider against using a single shard per index (other than HA, etc... since we already don't have that when running on a single node)

That would be recommended. You may also want to consider having each index cover a longer retention period.

No.

1 Like

thank you very much.

what about my other question at the beginning of the thread?
how can I effectively benchmark this system with real life data, so that I can understand our limits with the given system?
any pointers on that one?

1 Like

Have a look at the following resources:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

https://www.elastic.co/elasticon/conf/2018/sf/the-seven-deadly-sins-of-elasticsearch-benchmarking

thank you very much.

Hi again,
so I've been doing some benchmarking, and I do think that it's possible that our HD is actually the or a bottleneck in this case.
I'm getting results of around 60 MB/s for read speed when simply using hdparm, and something similar for write speed when using dd with blocks of 8k to test. results actually vary when testing with different counts of blocks. for example when trying 10k blocks (80MB for the test file in total), the speed is around 250 MB/s, but when trying 100k blocks, it goes down to around 100 MB/s, and when going up to 1000k (8GB file) it goes back to around 250 MB/s. so i'm not sure I really understand what's the deal here entirely. maybe the testing is faulty at some paint, but still the write speed never goes below 100 MB/s, that's for sure.

anyhow, the question I'm struggling with is how do I find out what is the real bottleneck for elasticsearch?

I've ran esrally with various tracks.
for example I've ran this track:

esrally --distribution-version=5.6.3 --track noaa --car 4gheap --challenge=append-no-conflicts --include-tasks="index" --track-params="bulk_size:10000,ingest_percentage:5"

and the results look like this:

can anyone give me some lead on what I can deduct from those and other metrics?
what is the capacity that I can count on, with the current setup?
what can I look to improve? what's most important to improve?

many thanks!

Elasticsearch performs a good amount of random access reads and writes, so comparing the throughput to a test that reads and/or writes quite large blocks may not be very accurate. I would recommend monitoring I/O performance, e.g. using iostat, while your cluster is running to see what I/O latencies and how much iowait you are experiencing. This will give a good indication whether the disk is the bottleneck or not (I suspect it likely is).

thank you. I have done some monitoring like iostat and other tools during normal operation of the cluster, and they all indicate some levels of iowait that the cpu is experiencing. sometimes around 20%, sometimes less, in some extreme cases I've seen even 70%.

but assuming this is the hardware I have to deal with, how can I best benchmark the cluster with esrally, but with my own data. because I'm not certain that the existing tracks are representing accurately my situation.

any pointers there?

thanks!

The best way is to create a custom Rally track that simulates your use case. Exactly how to best do that depends on the use-case you want to simulate. Can you provide some additional details about your use-case and the types of data you have?

well the main use case is ingestion of metric data from multiple network devices. some security data, some network traffic data, etc. most of the fields are either strings or different numeric fields.
what other additional details can I provide?

[Edit]:
and regarding the other side of things, this data is being queried by custom dashboards (not kibana), that are used as near live monitors and generate some alerts from this data.

You could have a look at the rally-eventdata-track, which simulates indexing and querying of nginx access logs. You might be able to adapt this or use it for inspiration when creating your own track. You may even get valuable information from running it as is even though it does not accurately reflect your data and access patterns.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.