Elasticsearch Indexing Rate

I am trying to do some performance tuning on my environment.
Some initial details:

  • 5 Node cluster - VMs
  • Each VM has 8 cores
  • Each VM has 16GB of RAM
  • One Elasticsearch node per VM. Each ES node has 8GB of RAM allocated and writes to a 'local' disk (as local as I can get from the internal cloud team where I work)

I am collecting from between 50 and 60 servers with Logstash instances shipping to one of 3 Redis instances via a VIP. I get fairly even distribution of data. The Redis instances are not clustered.

From each Redis I have 1 logstash agent that pulls off of that one Redis, does filtering logic on the records, and inserts into Elasticsearch. The output Elasticsearch plugin is configured with all 5 hosts in the 'hosts' parameter.

The 3 redis instances are on different servers, BUT, are co-located on the Elasticsearch nodes.
Same with the Logstash indexers.

------------*    ------------*    ------------*     ------------*     ------------*
|          |*    |          |*    |          |*     |          |*     |          |*
|   ____   |*    |   ____   |*    |   ____   |*     |          |*     |          |*
|  | Rd |  |*    |  | Rd |  |*    |  | Rd |  |*     |          |*     |          |*
|  |____|  |*    |  |____|  |*    |  |____|  |*     |          |*     |          |*
|   ____   |*    |   ____   |*    |   ____   |*     |          |*     |          |*
|  | LS |  |*    |  | LS |  |*    |  | LS |  |*     |          |*     |          |*
|  |____|  |*    |  |____|  |*    |  |____|  |*     |          |*     |          |*
|   ____   |*    |   ____   |*    |   ____   |*     |   ____   |*     |   ____   |*
|  | ES |  |*    |  | ES |  |*    |  | ES |  |*     |  | ES |  |*     |  | ES |  |*
|  |____|  |*    |  |____|  |*    |  |____|  |*     |  |____|  |*     |  |____|  |*
|          |*    |          |*    |          |*     |          |*     |          |*
------------*    ------------*    ------------*     ------------*     ------------*

As you can see from the diagram, I have 3 servers where processes are co-located.
This will be something that changes soon

The problem I've been having is I can't seem to break past an average of about 1200-1500 messages per second on my Indexing Rate.

I had a large amount of volume come in a few days ago as part of some testing, and I ended up with about 5-6million log messages sitting across my 3 Redis instances waiting to be indexed. At the time, I only had 4 cores on each machine. Since then I have grown the cores from 4 to 8 per machine and have setup much better monitoring on my server stats to try to find any bottlenecks

I tried many different things, but no matter what, it didn't seem to affect the ingestion rate.
I added a Threads option to the redis input block.
I took another server (not in the diagram above) and stood up more logstash indexers to pull from the different redis instances.
I read through the Elastic guide for tuning ES here:

Made sure I had no Swapping. increased the index thread pool.

I understand that this post doesn't provide enough information for someone to really and truly diagnose my problems, but I felt like it would be good to get it all written down, at least for my own benefit. And, maybe someone can point out something I'm just overlooking.

In the meantime I'm continuing to gather metrics on each piece of my stack to find the bottlenecks.

Some questions I have:
Redis - By default, Redis has it's Save and AOF options enabled. Is this something that, for people who are using Redis and getting high indexing rates, is being turned off? I have noticed that when Redis does a snapshot save, it stops reads/writes for a short time until the snapshot is complete

There's a lot of write-ups talking about how to maximize disk I/O, but in a corporate environment where you just really don't have a lot of insight into how the disks are setup, how do you troubleshoot that? I have monitors setup with Nagios and Check_MK to watch my I/O Reads/Write per VM, but I don't know how to take that information. If it's in a SAN, then technically my data could be being striped across many disks anyways. So should I still expect regular, single-spinning-disk write speeds?

When I looked at the CPU utilization during the heavy load times, I didn't see any CPUs being maxed. The utilization was hitting maybe.. 70-80% on average, but to me that doesn't mean 'maxed'. At the time they were 4 core machines, and the Load average on some was spiking up to around 5 every so often, but the 10-15 minute load averages were around 3.

My RAM usage doesn't seem unhealthy. I am running 8GB jvms on 16GB boxes. This is one area that I understand I may really get a performance boost on when I move Redis and Logstash off of the ES nodes and the Filesystem Cache is dedicated to what ES is doing, but I can't imagine that giving me 2-3 times the performance.

(too much text for one post)

So there we have it.. a lot of thoughts, and information. Thanks for reading if you did indeed read this far, and I appreciate ANY feedback.

What do the indexes you're writing to look like? # shards, replicas and such? Have you benchmarked a single node/single shard to see what kind of indexing throughput is achievable? Have you checked iowait numbers to see if disk is showing signs of being a problem? Having to use whatever virtual storage they're providing is definitely a big question mark that could really limit capabilities.


What do the indexes you're writing to look like?

Daily indices. 3 shards each with 1 replica (So 6 shards per day).

Have you benchmarked a single node/single shard to see what kind of indexing throughput is achievable?

No, I haven't.
Can you point me to some good standards for doing so? I get the idea around why to do it, but I just want to make sure that when I do it, I'm not messing up my results.

Have you checked iowait numbers to see if disk is showing signs of being a problem?

I'll be honest here, using iostat and iowait is something I'm not very strong in.
However, (and I'm double checking this right now actually) The storage team said that when I was experiencing my heavy load, they saw very very low wait times, indicating that I was not being I/O bottlenecked. I'm trying to get some more info on this right now, like I said, and I'll update with the specifics when, and if, I get 'em.

Having to use whatever virtual storage they're providing is definitely a big question mark that could really limit capabilities.

If this is indeed whats slowing me down, then at the very least I'd like to be able to prove it. What kind of metrics would you recommend I try to gather on this to determine?
Right now, I'm able to get a per-VM measurement of Reads/Writes per second, but since I've been gathering this, I haven't experienced as heavy load as I did last week. And I have no real baseline to compare what I'm seeing against.

Just try running iostat 5 for a while and see what level of io you're getting (ignore the first sample it reports). In this instance you don't need specifics you're just trying to figure out if it's even a candidate. If the iowait number is greater than 1% it might be worth digging deeper otherwise the problem is likely elsewhere.

The problem with asking about storage to the people that run the storage is that they're going to look at the storage it self. You have a whole stack of software and a network between your OS and that storage layer that could possibly be causing issues so you have to see what gets reported by your OS. My bet is that it's fine but it's such an obvious potential issue it's the first thing I would try to rule out.

Testing the raw performance of a single node/single shard will give you something to base other analysis on and excludes redis/logstash from the equation. It's also a good opportunity to watch the CPU and disk performance. It doesn't have to be fancy just something that can generate arbitrary data that's similar to what your real data looks like. We use Teraslice for this but we built that tool so it's familiar, I'm sure there are better tools around. You're just trying to find out how many docs/s you can store.

Also be aware that with 5 nodes and 6 total shards, one node will have 2 shards on the same disk. It would probably be better to have 5 shards with 1 replica so that each node ends up with 2 shards and you have even distribution across nodes.

I'm going to be working on watching iostat this week when I can get some more load coming in. I expect that to be within a couple days. I'll update when I get some info from that.

As for doing a single node/single shard test, I feel like I don't know exactly where to start. I'm hesitant to set that up to run on any of my production servers, however only my production servers are 8 core.
Would it matter if I ran it on a 4 core server just for baseline purposes? Or is that going to be too much of a variance to be able to depend on the results of that as a baseline? (I.e. If I run it on a 4 core machine, could I just essentially half the numbers to see how it would run on an 8 core box - obviously that wouldn't be exact, but would it be close enough?) I have recently created a log generation script that simulates logs that we generate here - same formats and all for testing my parsing patterns. So once I determine a good place to run that test, I should be able to get that knocked out in a day or two.
For that, am I just basically going to be filling up the shard as fast as I can and see what kind of CPU/Disk/Memory performance I'm getting?

Regarding shard balancing. I plan on growing my prod cluster to 8 machines for Elasticsearch in the next week or 2. Should I plan on going 4x1 on my shards in anticipation of that? Or 8x1? Or would that be highly dependant on the results of the single/single test?

Thanks for your help on this.


we have been doing some benchmarking in our setup in order to find out how to handle 20000 log indexing per sec.
basically here is what we have decided to do in order to be able to cope with this amount of data:
first: if you use daily index, set the replica to 0 on the current day and use a cron job to set it to 1 (or whatever you want) the day after. This has huge impact on performances the only problem is that you don't secure your data for the current day. It's a choice to make.
The ES doc says: bulk index file should be between 5 and 15 MB, our tests showed that with our application, 11MG was good.
We also discovered that it's very efficient to do parallel indexing directly on each data node, but the number of parallel bulk index should not exceed the number of CPU.
And also that the number of shard should be equal to the number of CPU.


Can you speak more to the parallel indexing and what specifically you're referring to here?
Is this going to be the 'workers' property in the Logstash Elasticsearch output plugin?

I had a spike in load today, and what I noticed is that the I/O of each node during the load was about 10MB/s PER Shard. So, since I have 1 node that has 2 shards, this resulted in that node doing 20MB/s in I/O. I have definitely seen the disks do higher write speeds - seen them near 100MB/s - so I don't believe I was bottlenecked there.

However, my CPU usage spiked to 80% for the duration of the load, but only on 2 of the nodes.
I would have hoped it would have been split evenly across all 5 nodes (or as evenly as possible) similarly to the way the IO was.

actually, we don't use logstash, we have our own parsers and load to handle the parallel indexing.

regarding the disks, during our test we were using slow disk (7.2) in a virtualized environment and even with this the disk was never the bottleneck.

Indexing is clearly a CPU issue.
You need to find a way to distribute and paralelize the load to leverage the CPUs but I don't know how to do this with logstash