Suggestion on Elasticsearch scaling and performance for log management

Blason · September 15, 2019, 5:25pm

Hi Folks,

I know this is a very vague question and might have answered a couple of times before, but wanted another pair of eyes to look at and vet?

I am planning to build log management/Security Analytics solution and will be collecting logs from around 100 devices comprises for [Servers/Routers/Switches/Firewalls] which I believe should not generate more than 15-20 GB per day

I am planning with 4 nodes

1xLogstash - 24GB
2xES nodes [cluster] - 32 GB each/ having 3 TB of Diskspace
1xwazuh/OSSEC node for accepting messages from Server sending it to ES directly with 16Gb RAM

Now my queries are -

How many shards should be configured? 5 is enough?
What should be HEAP_SIZE on ES? Considering 32 GB -> 16GB is enough?
Can I install Kibana on Primary elasticsearch node? Or do I need to install Kibana on a different server?
And Kibana will/should connect to the primary node of the cluster?
Similarly, Logstash will send data to the primary node as well?
Considering future growth and shard numbers I can add more ES node in the cluster; right?
Any other optimization tips are really appreciated

TIA
Blason R

DavidTurner · September 15, 2019, 6:12pm

1 sounds enough (and this is the default in recent versions). 5 sounds like too many. Use ILM to move to a new index when the current one reaches a reasonable size (say 40GB).

If your machines have 32GB of RAM then 16GB (50%) is the absolute maximum allowed. You may get better performance with a smaller heap.

There's no such thing as a "primary node". Both nodes are equal from Elasticsearch's point of view, and you can send data and searches to either.

Yes. Note that you cannot build a fault-tolerant cluster with only two nodes - you need at least three master-eligible nodes for resilience. But after that you can add data nodes as needed. You might also want to segregate your data nodes into a hot/warm architecture.

Blason · September 15, 2019, 6:22pm

Super!! Only confusing part for me is shards? Wondering how 1 shard would suffice my need?

My ultimate goal here is to achieve cluster with

Fault tolerant
Scalable
And of course Optimally used

DavidTurner · September 15, 2019, 6:34pm

I don't really understand the question. What is making you think that 1 shard per index will not be enough?

Blason · September 15, 2019, 6:43pm

If I am not wrong one shard probably will not spawn on multiple nodes, correct? and considering the future growth if data volume increases; if I introduce 2 mode nodes 1 shard will not be enough, right?

DavidTurner · September 15, 2019, 6:47pm

Ok I think I see. By "number of shards" I mean the number set by the index.number_of_shards setting. 1 is the default for that and that sounds reasonable for your case. But for fault tolerance each shard must have a replica, and this is controlled by the independent index.number_of_replicas setting. 1 is the default for that too, meaning that each shard will have a copy on both nodes, and that sounds good for you too.

gavenkoa · September 16, 2019, 1:27pm

Why do you need Logstash? I think it's a bit pure software because of Ruby interpreter...

ES has ingest nodes since 5.x and supports 30 or so input filters.

Unless you need to enhance IP addresses with GEO data (as I know there is no such ES filter) I don't understand why do you need to waste 24 GiB of RAM when it can be ES ingest node.

Blason · September 17, 2019, 3:06am

Ingest node however I am not so sure about the ingest node can start listening on any port for incoming data from my network devices? From servers it would work since I will be installing shippers but what about Network devices?

gavenkoa · September 17, 2019, 8:44am

Pick a Protocol. Or Build Your Own

Packetbeat is also a library. It supports many application layer protocols, from database to key-value stores to HTTP and low-level protocols. Choose the one you need or add your own by submitting a pull request.

system · October 15, 2019, 8:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting up elasticsearch to scale: shards per index Elasticsearch	9	480	July 6, 2017
Cluster sizing Elasticsearch	7	407	July 6, 2017
Installation questions Elasticsearch	5	631	May 22, 2017
Performance degrading after a couple of weeks Elasticsearch	7	520	October 30, 2018
Server configuration suggestions Elasticsearch	2	459	July 6, 2017

Suggestion on Elasticsearch scaling and performance for log management

Pick a Protocol. Or Build Your Own

Related topics