I read log into elasticsearch using the _bulk endpoint and manage around 10Klines/sec. I have very modest regexp filtering just @timestamp, operationid and log, if I add more elaborative filtering the insert rate goes down. (the server can deliver 80K lines/sec)
Is there a way to refilter this index later on. I only have a count as index. creating uuids or a unique of log takes too long.
A perfect 1 stage parsing leaves me with 400 lines/sec
No ingest node, just prefiltering in python. it is a single node in docker with 64GB ram and 1T 4ssd striped. I will try and split it up and do it in parallell since insertion across network easily handles >10K lines/sec, but I'm cpu and memory bound on server. We tried rsyslog and do processing directly on the es server, but rsyslog only manages around 3000 lines/per sec. _bulk is much faster.
Some ideas of best practices would be nice.
Thanks
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.