Tls errors, max mem, max clients... sudden dead everywhere!

(Alejandro Olivan) #1

Hi again forum!

...I'm sad as my brand new toy lasted very little time... thank God I kept ald stuff on.
As I decided that ELK may still be suitable to have nice stats on services for IT staff, I started creting my filters and got my elasticsearch mini cluster up, and so .... but as I added a few logstash-forwarder clients with several services, problems aose very quick...

  • First was logstash stopping out of memory.... inceasing max mem limit is a stopgap, so I upgraded my debian repo from 1.4 to 1.5 and upgrade, since I read that there is a memory leak on TCP connections (I use them) that is fixed on 1.5.
  • Second, upgrading was not clean... since I have added some patterns on the patterns dir there were problems deleting old folders (fortunately!!!! that saved up my filters!!!!) and upgrade was a little bit bitter
    -Third , As I recover from memory problems (increased the mem limit anyway) there seems to be connection problems from the LSF clients... the max_connection limit appeears on the log... and I'm starting to ecome worried...
  • Trying to find where is the connection limit defined, I ended up discovering it is defined nowhere...
  • Out of desperation I read that increasing threads may solve the problem... so I added -w X on the init.d script, since I'm unable to find the thread parameter nowhere on the /etc/default/ file... and tried again.
  • The problem persists.... now being TLS handshake everywhere...
  • In addition, since -w 8 usage, start/stop is horrible...

So.... as you can read, it all is out of service... very very disapointing and, of course, far from being considered for production.... but I read people tah has being using it on thousands of servers!!! why now is not possible to run more than just 10, 12 servers? I got plenty of CPU and RAM everywhere.... and the datacenter eth connection is gigabit everywhere.... in all no system is overloaded...

So a massive full failure is really strange.... do you believe I'm missing something? it all apears a nightmare!

Best regards

(Alejandro Olivan) #2

Is it possible to have several lumberjack inputs listening on different ports???

It may be a way to deal with this mess...

(Magnus B├Ąck) #3

Yes, that should work just fine.

(Alejandro Olivan) #4

aha... I tried it but it didn't work... ie the syntax I used was not found wrong and the service started, but by issuing netstat -tulnp I realized no listening was occuring on the port.
This is something I didnt found anywhere in google... no single example, post or howto... very strange.
maybe you could put an example on how to do it... i just copypasted my single lumberjack input to a new one and changed the port number... no complains, but no joy.

So... I returned to my Debianist philosophy: I downgraded blleding edge 1.5 to former last 1.4 ( that I consider stable enough ).
I'm installing a redis server.
I will drop "sid like/testing" logstash-forwarder marvels and replace them with 1.4 logstash on every reporting server.
So, I will distribute log procesing at the origin end, sending to a common redis broker server, and using a single logstash server receiving end, to ingest pre formated/groked data from redis just to inject it to logstash.
If possible, I would stay away of TLS.

This is something I have seen several old howtos around... I bet it will work more robust and sclable.
I'm also scaling up ES implementing multiple master, and multiple data client nodes (I still thinking how to balance them... I think on nginx)... so the creature is exploding in complexity... hope it all pays the efforts

Best regards!

(system) #5