Clustering question (storage needs and performance)

Hi,

currently we have a big machine running a single instance of logstash, kibana and elasticsearch.
It has 128GB RAM 700GB storage and 8 CPU. It's a virtual machine, disk is located in a NAS.

I think we are currently not working with replica shards. At least I got output of following kind:

GET _cat/shards
other-prod-2017.12.05            0 p STARTED     285311 110.7mb 127.0.0.1 node-1
other-prod-2017.12.05            0 r UNASSIGNED                           
other-staging-2017.09.30         0 p STARTED        279 238.9kb 127.0.0.1 node-1
other-staging-2017.09.30         0 r UNASSIGNED                           

Now I am thinking about building a cluster, maybe on a single machine. My goal is to increase performance for indexing and querying.

  • If I build a cluster of two nodes for example, than I need twice of the storage, correct?
  • Will it decrease indexing speed because Data needs to be written to primary and replica shards?
  • Increasing the number of replica will increase query performance as long as the disk is not fully loaded, right?
  • Is it possible to use replica shards by multiple nodes (because we store in NAS), or will I have a physical bottleneck in my NAS because the file is only once on disk?

I would like to understand what results I my doings probalbly will have because although we are on a VM our datacenter is sending us a bill for each change. So I do not want to waste money for trial and error :wink:

Thanks, Andreas

If you are running into heap pressure, adding a second node may help. From an indexing performance perspective I would however not necessarily expect adding a second node on the same server would help.

Have you identified what is limiting your indexing performance? As you are using NAS, which is not ideal, I would recommend monitoring disk I/O and iowait.

My main problem is request time. If I query some complex dashboards in kibana for a bigger time interval (e.g. one week) I get timeout (60s) on first try. Then I wait a minute or so and then I can retry the query which gives me the result. I think it's because the result is cached.

Especially if we have incidents on our system which we are monitoring via kibana / elasticsearch and more users than usual are using kibana, we have these issues.

I don't know, if I am interpreting all values from montiroting correctly. So for giving you a picture of our system, I will post some screenshots:



I don't know, if I took a good example, where our issues occurred.
So If I am checking these graphs, heap seems not to be an issue.

I currently do not have historical data about disk i/o and iowait. Can it be measured with metricbeat? If so I could enable it quite easily.

btw, I am currently working on an update to 6.0.1, but I assume that the same steps will increase performance on both versions.

Based on the graphs above it looks like you have a lot of quite small shards, which probably does not help with query performance. I would recommend reading this blog post on shards and sharding practices.

You do not necessarily need historical data even though it off course is good to have - just take a few samples when you index and issue an expensive query at the same time. That should tell you how your storage is performing.

thanks for the reply.

i need advice about sizing my shards, especially the migration to 6.0.1 with its 1 type per shard policy seem contra productive. If i have too many shards now, then i ill have more with one type index.

Changing daily rotating to weekly draws application changes, where I implemented to query only the last 2 day's indices. That improved performance much on 5.1.2 when not querying all indices.

Is it still in ES 6.0.1 that way, that querying a time based index which is out of bounds takes noticeable time? Or can I dump my optimization because it is not needed with current version?

If you have data of different types in a 5.x index, you do not have any mapping conflicts, as this is checked. You can therefore switch to a single type and just add a different field that identifies the type. You can then filter on this field instead of the traditional type in Kibana and custom queries. Switching to a single type should therefore not necessarily lead to more indices.

In 6.x querying large number of shards has been made much more efficient, and Kibana therefore do no longer determine the exact indices to query before sending the queries. I would recommend trying addressing the full index pattern and see how it compares to what you are currently doing. If you can change to this, you have more flexibility around how you manage you indices, e.g. through the use of the rollover index API.

can you give any advice except for trying it out which will have better performance on 6.0.1 and better compression on disk?

  • each old type goes to a index named %type%-%datepattern%
    -> so maybe I switch from daily to weekly, maybe using rollover index api you mentioned.
    -> currently we have 25 types
  • keeping multiple types per index and putting the type to a different field, e.g. logfileType. Maybe some optimizations about the rollover.
    -> I think we have less indexes here, but the index is more inhomogeneous, meaning a lot of fields are not filled because they depends of the source logfile / logfileType.

If possible I would like to have a great performance and low storage consumption :slight_smile:

Larger shards generally compresses better and results in less overhead than small shards. Sparse values used to have the potential to result in a lot more disk space being used, but this has been improved in Elasticsearch 6.0. I would therefore think option 2 is better, but the best way to find out is to test.

thanks.

We have two kind of log messages:

  1. metrics (measured once a minute) from multiple applications which we show on main dashboards to see if we have some anomaly in request count, output of metricbeat, etc.
  2. log file of processed messages where we log each message with processing times, etc.

That's why I formerly devided them into different indices because we don't mix them up often in dashboards, only sometimes).
Log Amount per day :

type 1:  0.7 GB
type 2: 10.0 GB

In most dashboards we only show data of type 1 xor type 2.
That's why I splitted them up into different indices.
Other splits are caused by different retention times of the data.

So if I understand you correctly you don't expect less performance on querying type 1 data if I add them to type 2 index?

If so, I will try version two (multiple types (identified by new field)) in one index first and compare 6.0.1 performance and data usage against 5.1.2

You are showing 2 types here but mentioned that you have 25 types earlier. If you only have 2 types of very different data, separating them into different indices could be wise, especially if they have different retention periods. If it is 25 types, you run the risk of having a lot of small shards, which can be inefficient.

last I have meant sort of indices.
There are different types of metrics for example where I needed multiply real types. I clustered them in a perf-* index (mentioned as type 1)

The real logs also have different types because each application has different types. Since all are tuxedo applications I clustered them in a tux-* index (mentioned as type 2).

So my concern is putting ,merging index of type 1 into type 2 index, if it gets slower if I query only for a (real) type which was formerly in index type 1.

hope it is now clearer what I meant :wink:

thanks a lot

Here are some more details of our current sharding and indexing strategy:

Due to different retention times we have an index name part "prod-" and an index name part "staging-".

Followed by that we are organizing our 25 types into following "index types" which will be present in the index prefix:

size on prod

index prefix: tux-:     1 primary shard, 0 replica,  9 types,  8.4 mio docs / 10 GB   per day
index prefix: other-:   1 primary shard, 0 replica,  5 types,     300k docs / 0.15 GB per day
index prefix: perf-:    1 primary shard, 0 replica, 11 types,  1.5 mio docs / 0.8 GB  per day

size on staging:

index prefix: tux-:     1 primary shard, 0 replica,  9 types,  730k docs / 0.7  GB  per day
index prefix: other-:   1 primary shard, 0 replica,  5 types,  0.7k docs / 0.01 GB  per day
index prefix: perf-:    1 primary shard, 0 replica, 11 types,  800k docs / 0.3  GB  per day

Since we are currently on daily index rollover the final index names look like that:

tux-prod-2017.12.12, tux-staging-2012.12.12, perf-prod-2017.12.12, etc.

To avoid too many small indices I plan to get rid of the "other-" indices and move them to the tux indices and rename it to details.

So as target we will have only perf-[staging/prod]-[timepattern] and details-[staging/prod]-[timepattern]

Now I am looking for an advise of waging rollover versus shard count.

Our main use case is using kibana. So I want to focus on that.
Our main dashboard which is refreshed each minute by about 3 to 10 users contains 15 visualizations.
Index patterns are only optimizing the search for the index prefix (perf- or tux-).
So we can use the same dashboards for prod and staging and define by a filter on a field which stage of our system we would like to see.

Most queries are only fetching data for up to 12 hours, especially in our main dashboards.
Sometimes we fetch data for a week or a month.

Currently as normal usecase I see following in top:

Kibana monitoring is giving showing me the current picture, if it might help:

Referring to our main dashboard:

index data source             count of visualizations
perf                          8
tux                           7

What does kibana and elasticsearch do when refreshing the dashboard?
will the queries run in parallel for the one sharded tux index or will they be serialized and be queried one after another? Keep in mind that we have multiple users with autorefresh enabled.

The question is leading into following next question:
May it be faster for queries to have 3 shards and weekly / size based rotating instead of daily rotating with one shard?
As I understand 20-40GB for an index are good values for a shard.

Hi,

Just giving my 2 cents since I'm running a fairly similar use case (metrics & logs for our main application resulting in a little more than 500 million events in a week), you probably need to scale out your cluster and get faster storage. I'm running a 3 ES nodes cluster with 4 GB RAM and 200 GB of SAN storage (15K rpm disks, not SSDs) for each VM and a separate VM for Kibana, also with 4 GB RAM and an Elasticsearch "search" node, and I'm not getting any of the spikes you do, even though I've got so much less RAM than you do.

If I were you, and since you were mentioning that your datacenter charges you for any operation (as they should), including tests, I'd grab 3 PCs with SSDs, set up an ES cluster on them and run some tests on that just to see if that'd improve your situation without spending too much time in manual indices optimization (which you shouldn't anyway once you've defined your mappings). Or if you don't have PCs laying around you can use for that purpose just run those tests on AWS, even though it'll cost you a bit of money it'll probably be a lot cheaper than your own datacenter, especially if you plan and automate all your deployments and tests beforehand. For your tests I suggest you look into Rally (https://www.elastic.co/blog/announcing-rally-benchmarking-for-elasticsearch) if you haven't already

Going back to your original questions :

  • If I build a cluster of two nodes for example, than I need twice of the storage, correct? No, you only need additional storage if you enable replicas, if you don't the data will simply be spread across nodes. That being said I'd enable replicas simply for data safety and rolling upgrades. Downtimes are much more expensive than storage anyway.
  • Will it decrease indexing speed because Data needs to be written to primary and replica shards? Probably the opposite since write operations will be spread out across the cluster's nodes. At least in my experience.
  • Increasing the number of replica will increase query performance as long as the disk is not fully loaded, right? Not sure there, I haven't really looked into that since we don't have any query performance issues (yet)
  • Is it possible to use replica shards by multiple nodes (because we store in NAS), or will I have a physical bottleneck in my NAS because the file is only once on disk? While technically possible you will definitely get a physical bottleneck, unless you've got a real fancy NAS full of SSDs with a high-bandwidth low-latency network (which I'm guessing isn't the case since usually NASes are designed to be cheap with as much TB as possible).

Quick question for you : how do your indexing and search latencies look (in the cluster Overview) ?

Best regards,
Nicolas

For the normal usecases if we have no incident -> less users on kibana -> times for 1 day are acceptable fast.

I did some requests (in one additional browser tab) to check out the impact when running queries for 7 days.

our Main dashboard took a while but came back before 60s timeout. Another dashboard which is less often used went in timeout. After a while I retired and it worked (thanks chache I think).

Watching top during the test I noticed following:
First io/wait goes up tu 30% while CPU is at 30 %.
A bit later I/O wait goes down to < 5% and CPU goes up to 50 - 80%.

Elasticsearch overview (started test short after red marking):

Here is the equivalent of kibana monitoring:

Hi,

Yeah, those kind of top stats are typical of NAS access, first your ES node is waiting on the NAS to answer (thus the 30% I/O wait) and then once it gets the data it starts munching it.

As you can clearly see in your graphs when you're running those 7 days queries your indexing rate drops down to 0 and your indexing latency goes up to 3ms, so you're definitely hitting some kind of storage bottleneck.

Here's how our graphs look when I run the Metricbeat MySQL dashboard on a 1 min autorefresh


As you can see we also have an impact on our indexing rate and latency, but it's far from hitting us as hard as it does you, even though we have that much less RAM. Search latency and Client response time would probably be faster with more RAM (gotta try that out since we recently added RAM in our physical infra).

Also, I noticed that half your shards are unassigned so, unless something is horribly wrong and that is half your primary shards, you already have replicas on your current indices.

1 Like

Thanks, that sounds interesting.
I should talk to our DC what an upgrade from NAS to SAN costs.

About the unassigned shards:

GET _cat/shards/*prod*
-> results in: 

tux-prod-2017.12.10   0 p STARTED    2027052   1.9gb 127.0.0.1 node-1
tux-prod-2017.12.10   0 r UNASSIGNED                           
tux-prod-2017.11.29   0 p STARTED    8764316   9.8gb 127.0.0.1 node-1
tux-prod-2017.11.29   0 r UNASSIGNED                           
perf-prod-2017.12.09  0 p STARTED    1366596   726mb 127.0.0.1 node-1
perf-prod-2017.12.09  0 r UNASSIGNED                           
perf-prod-2017.11.17  0 p STARTED    1051991 537.4mb 127.0.0.1 node-1
perf-prod-2017.11.17  0 r UNASSIGNED                           

...

We are not using replicas yet. As I see it that's the reason why we have exactly 50% unassigned shards.

For the sake of completeness you should also ask them how much it'd cost to get dedicated boxes with internal storage, that might end up being cheaper than SAN backed VMs.

1 Like

just for comparision. how many cpu is inside each of your nodes?

2 vCPUs per VM, the physical CPUs underneath are Intel Xeon CPU E5-2620 v2.

Also double-checked again the memory allocated to the VMs, it's actually 8 GB, half of it being allocated to ES' heap. That's still a lot less than your 128 GB :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.