2nd stage filtering


(Morten Bjoernsvik) #1

Hi

I read log into elasticsearch using the _bulk endpoint and manage around 10Klines/sec. I have very modest regexp filtering just @timestamp, operationid and log, if I add more elaborative filtering the insert rate goes down. (the server can deliver 80K lines/sec)

Is there a way to refilter this index later on. I only have a count as index. creating uuids or a unique of log takes too long.

A perfect 1 stage parsing leaves me with 400 lines/sec


(Christian Dahlqvist) #2

Are you using ingest node to parse the data? What is the size and specification of your cluster?


(Morten Bjoernsvik) #3

Hi

No ingest node, just prefiltering in python. it is a single node in docker with 64GB ram and 1T 4ssd striped. I will try and split it up and do it in parallell since insertion across network easily handles >10K lines/sec, but I'm cpu and memory bound on server. We tried rsyslog and do processing directly on the es server, but rsyslog only manages around 3000 lines/per sec. _bulk is much faster.

Some ideas of best practices would be nice.
Thanks