Is it really that painfull ? need advice


(Alex) #1

Hello all,

sorry to bother you. just wondering and by wondering I mean , smashing my head on the computer desk over and over thinking and probably over thinking it ...

Lets say I have access to 1 server 16gigabyte of ram, 4 cpu 3ghz

  • NOT TWO, one

I would like to use that ELK stack for trafic monitoring purpose. using log from fortigate device, currentely sent to syslogd server ( same machine as described earlier ) using logstash to forward those logs to ES and... finaly visualise em in Kibana !

Lets also say that I currently have ~ 25g of logs per day created in a "daily indice" ANDDDDD we will more than probably reach ~100gig of logs per day ( per daily indices )

now, to add complexity ! In a near future, we would like to be able to see up to 30 days of "logs information" in a single visualisation dashboard ( currentely at ~25giga we cannot. we get a "freaking 30 000ms timeout" in kibana for only 7days dashboard... ******** any help is welcome here.. ********** any way to generate a view no matter how long it will take ? like a monthly overnight report ? I couldnt care less... as long I can see it when needed...)

now ... ive been thru almost everything to optmize the stack,
ram dedicated
mlocktrue all
desactivate swap in fstab
change refresh interval
etc.

Everything I can think of right now OR found on internet, but "personalise my mapping" and honestly, this part.. is absolutely a pain in the .... I am really forced to do it? I mean ffs. I have more than 200 field that would need to be personalised, exclude or whatever else...

AM I REALLY stuck with that customisation to do ? ( not only that but afterward... I would have to figure out a way to reindex the whole thing after creating my personalised template.. )

See my problem is ... it is a permanentely moving machine, It will continue again and again and again to receive logs from FTG device AND index em. it will never stop. So, i cannot stop logstash instance or whatever else to dedicate ressource that might be somehow missing neighter can I stop the "indexation"

Please, help me... Im literally close to lose my mind playing in those "freaking" templates...


(Mark Walkom) #2

Sounds like you need more resources. Either more nodes in the cluster or a larger node.

Everything comes at a cost.


(Alex) #3

Tried to add more ressources..

currentely have one more nodes. unfortunately, altough it is almost identical to the master... I keep getting OOM ?


(Alex) #4

???? anyone

please. pretty please lol ?


#5

I've been spending the last year and a half trying to squeeze as much performance as I can from a single machine running ES, so I feel your pain and wish there was more information about this. ES seems to assume cluster/cloud rather than big iron/appliance.

Your JVM instances (presumably those running ES) are running out of memory and are getting killed by the Linux kernel. One thing you should check is how much heap is being allocated to your ES instances.

If you are running more than one instance, you would need to ensure that the total allocated is less than your entire machine, say 6GB each for two instances. In the above screenshot there are at least 5 different PIDs associated with a JVM, and so there may be multiple instances of ES running -- so make sure the right number are running. On a 16GB system, however, I do not know why you would run more than one instance of ES.

The process hangs you that see are likely the result of the out of memory condition, but could also be caused by an I/O problem.

Based on the use case you described I think you are may be using undersized hardware.


(Alex) #6

I run only one instances of elasticsearch 5 shards ( default ) no replica.

What information would you need to be able to help me diagnose this the best we could ?

I use use mlockall: true
and heap_size is 8192mb ( half of 16go RAM on system )

Now its two nodes, one 8 cpu, 16gb ram 400gb hard drive second is a clone of the first, but 4cpu 200gb hard drive.

thank you for replying..


#7

I would recommend running some monitoring tools like top(1), htop(1), free(1), and ps(1). Find out what processes are eating up memory. Is a stale elasticsearch process still around that should not be? Do you have an indexing process that is eating alot of memory?

Also, you should check what the JVM is using memory for. If you aren't already using doc_values, its quite possible a good amount of your memory is being used up in field data. Use the various ES APIs (_cat/health, _nodes/stats, _cluster/stats) to see how much heap is used and by what. Also check indexing memory and how much memory your open segments are taking up.

If you are running sysstat, sar(1) can help you track down long term trends in CPU, memory, and disk utilization. (sar -q, sar -r, and sar -d).


(Mark Walkom) #8

I'd reduce your shard count, you're probably wasting resources there.


(Alex) #9

how would someone proceed to do that ? and how many shard should a person use for that "specific" use ?


(Mark Walkom) #10

You need to reindex to a new index with less shards.
You can use something like https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06, or https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-reindex.html if you are on 2.3.


(Russ Cam) #11

I think with that amount of data per day (25g = 25 gigabytes?) on a single 16Gb RAM, 4 cpu machine, you're going to have a hard time with any data store. You're going to need more resources. How much more? Well, that's a little tricky to estimate without knowing

  1. The mapping for your log documents
  2. The byte size of each log document (in order to calculate bytes indexed/sec)
  3. are logs ingested at a constant uniform rate or is it spikey
  4. the expectations with regards to operational load on the cluster e.g. concurrent searches, aggregations, etc.

It's usually useful to do some rough estimations for these and provision a larger cluster than your estimate, then scale it down to the point where the performance meets your requirements. The alternative way is to start with a smaller cluster and gradually scale it up until the performance is satisfactory, but from experience, it can take longer going in this direction.

You should consider using Index Templates to define a mapping for your logs. As @warkolm mentioned, 5 shards for a daily log index of 25Gb is probably too many; you could probably have just 1 primary shard per daily index (with at least 1 replica) , but test this with your data and hardware to see if it meets your expectations.

Consider snapshotting your cluster as well as putting some kind of queueing system e.g. Kafka in front of your cluster so you have some reprieve to make some changes.


(Alex) #12

ok,

so it is really that painfull, ive a call scheduled with elastic team to size my thing and ask for support.. im done fuc**ng around. i just hope I can pay, X thousand and finaly get it done without having to struggle day after day with forums haha

thank you for replying people.

maybe it is just not. suitable for my needs...


#13

Would be very interested to know whether your conversation with the paid support enabled you to meet your needs.


(Alex) #14

Unfortunately, it didnt.

going to SIEMonster now.
the conversation only permit me to be able to know the "ratio" needed RAM VS data

long story made short its 1:16


(Mark Walkom) #15

I'm surprised we weren't able to help sizing things out. Would you be willing to PM me more details of who you spoke to so I can follow it up internally?


(system) #16